Music Recommendation System Based On Emotions

(1)

MESTRADO

MULTIMÉDIA - ESPECIALIZAÇÃO EM TECNOLOGIA

SISTEMA DE RECOMENDAÇÃO

DE MÚSICA BASEADO EM

EMOÇÕES

João Pedro dos Santos Figueiredo

M

2015 FACULDADES PARTICIPANTES: FACULDADE DE ENGENHARIA FACULDADE DE BELAS ARTES FACULDADE DE CIÊNCIAS FACULDADE DE ECONOMIA FACULDADE DE LETRAS

(2)

Sistema de Recomendação de Música

Baseado em Emoções

João Pedro dos Santos Figueiredo

Master in Multimedia of the University of Porto

Supervisor: Marcelo Caetano (PhD)

(3)

(4)

Sistema de Recomendação de Música Baseado em

Emoções

João Pedro dos Santos Figueiredo

Master in Multimedia of the University of Porto

Approved in public examination by the committee:

Chair: Ana Cristina Costa Aguiar (PhD, Professor)

External Examiner: Rui Pedro Pinto de Carvalho e Paiva (PhD, Professor)

Supervisor: Marcelo Caetano (PhD)

(5)

(6)

Resumo

A maioria dos sistemas de recomendação de música foca-se somente em técnicas de similaridade musical para sugerir músicas, ignorando por completo o contexto emocional do ouvinte. Nesta tese é descrito um sistema que propõe uma estratégia de recomendação que ordene as músicas por proximidade de anotações musicais no plano emocional

Valence-Arousal. É também considerada uma segunda estratégia que aplica modelos psicoacústicos. A

técnica é feita através de uma modelação via a extracção de Mel Frequency Cepstral

Coefficients de músicas. Adicionalmente, é proposto um experimento que consiste na avaliação

da qualidade das recomendações de acordo com a percepção emocional do ouvinte na música. Ambas as estratégias de recomendação são então comparadas, de modo a perceber qual apresenta um melhor desempenho. Os resultados demonstram que as recomendações que usam anotações emocionais têm melhor desempenho, sendo que o perfil do utilizador e a distribuição das emoções seleccionadas pelos ouvintes no plano bidimensional de emoções, não influenciaram a avaliação das recomendações. Por fim, são propostas diversas sugestões de como continuar e optimizar o trabalho desenvolvido, tendo por objectivo uma possível futura aplicação num contexto académico e/ou num produto comercial.

(7)

(8)

Abstract

Most music recommendation systems focus solely on music similarity techniques to suggest songs, disregarding the listener’s emotional context. This thesis describes a system that proposes a recommendation strategy that orders songs by music annotation proximity on the Valence-Arousal emotion model. It also considers a second strategy that applies psychoacoustic models. The technique relies on a modeling via Mel Frequency Cepstral Coefficients extraction from songs. Additionally, this thesis proposes an experiment that consists on the evaluation of the quality of the recommendations according to the listener’s emotional perception in music. Both recommendation strategies are then compared to understand which performs best. The results demonstrate that recommendations that use emotional annotations have a better performance, wherein the user profile and distribution of emotions selected by listeners on the bidimensional plane of emotions does not influence the evaluation of recommendations. To conclude, several suggestions on how to continue and improve this work are made, aiming the future application of this approach in an academic context and/or in a commercial product.

(9)

(10)

Acknowledgments

First of all I sincerely thank my supervisor Prof. Dr. Marcelo Caetano. His guidance, constant accompaniment, support and patience were remarkable.

I also thank the Sound and Music Computing Group at INESC Porto, specifically Prof. Dr. Rui Penha and Dr. Matthew Davies, for supporting and trusting me, and for embracing this idea. Prof. Luís Filipe Teixeira, many thanks for allowing me to start the development of the system in the Introduction to Programming course, even though it was not in the initial study plan.

A special thank you note to Alexandre Clément, Mariana Owen, Fábio Martins and Filipe Ferreira for their contribution to my work. To all the people who offered their time to experiment my system, I would like to thank you all. To all my colleagues at FEUP, I appreciate all of the good times and sharing this experience with you.

To Pixelmatters, especially to André Oliveira, thanks for believing and investing in me. Your commitment, flexibility, patience and understanding of what it means to work and study at the same time deserves my profound respect.

Last but definitely not least, to my parents, sister and aunt, for always being there for me. You all know the importance and significance of moving to Porto and initiating this master.

To my love Rita, you were the one that made this possible. Your patience and wise contribution is admirable. You were the fountain of energy that allowed me to carry on with this work every single day.

(11)

(12)

Index

Introduction ... 1

1.1 Problems and Research Aims ... 1

1.2 Thesis Outline ... 2

1.3 Thesis Contributions ... 2

State of the Art ... 5

2.1 Study of Music and Emotion ... 5

2.2 Recommender Systems ... 11

2.3 Music Recommendation Systems ... 16

2.4 Music Emotion Recognition on Recommender Systems ... 21

A music recommendation system based on emotions ... 25

3.1 Overview ... 25

3.2 Description ... 25

3.2.1 Requisites... 25

3.2.2 Emotion in Music Database ... 26

3.2.3 Interface and Interaction ... 27

3.2.4 Recommendation Strategies ... 30 Evaluation ... 37 4.1 Overview ... 37 4.2 Experiments ... 37 4.3 Results... 40 4.4 Discussion ... 54 4.5 Conclusions ... 57 4.6 Future Work ... 58 References ... 60

(13)

(14)

xiii

List of Figures

Figure 1: Hevner’s adjective circle (Hevner 1936). Source: (Caetano and Wiering

2012) 10

Figure 2: Russel’s circumplex model of Affect (Russell 1980). Source: (Caetano and

Wiering 2012) 11

Figure 3: Framework: Role of emotions in user interaction with a recommender

system. Source: (Tkalcic, Kosir et al. 2011) 22

Figure 4: Distribution of songs throughout the plane 27

Figure 5: System’s intructions screen 28

Figure 6: System interface 29

Figure 7(A): Example of the algorithmic approach that uses a fixed reference. 7(B):

Example of the algorithmic approach that uses a moving reference. 31

Figure 8: Steps to calculate MFCC features. Source: (Logan 2000) 34

Figure 9(A): Number of users per age interval, gender and musical education. 9(B):

Percentage of musical genres selected by users 40

Figure 10: Distribution of the points marked by the user on the plane 41

Figure 11: Distribution per quadrant of all songs suggested on R1 42

Figure 15: Percentage of recommendations repeated 46

Figure 16(A): Mean and standard deviation of the evaluations for each

recommendation. 16(B): Mean and standard deviation of the evaluations for each

recommendation per quadrant. 47

Figure 17: Relation of the recommendations evaluations mean and standard deviation

with the distance of the points marked by users to the center of the plane for each

recommendation 49

Figure 18: Mean and standard deviation of the evaluations of each recommendation

per age interval 50

(15)

xiv

for users that have musical education 52

Figure 21: Relation of the recommendations evaluations means and standard

deviations with the number of songs per recommendation that match users’

(16)

xv

List of Tables

Table 1: Relation of each recommendation with the algorithm and similarity measure

(17)

(18)

xvii

Abbreviations and Simbols

CSV ECG EEG EGG EMG FEUP FPA I/O INESC KL OS X MER MFCC MIR MP3 MPEG RS V-A WAVE Comma-separated values Electrocardiography Electroencephalography Electrogastrogram Electromyography

Faculdade de Engenharia da Universidade do Porto Finger pulse amplitude

Input/output

Instituto Nacional de Engenharia de Sistemas e Computadores Kullback–Leibler divergence

Unix-based graphical interface operating systems developed by Apple Inc Music Emotion Recognition

Mel Frequency Cepstral Coefficients Music Information Retrieval

MPEG-1 or MPEG-2 Audio Layer III Moving Picture Experts Group Recommender System

Valence-Arousal emotion model Waveform Audio File Format

(19)

Chapter 1 Introduction

1.1 Problems and Research Aims

With the emergence of digital music services and Web 2.0 technologies, the digital music catalog has grown substantially and continues to quickly expand. Not only is the amount of content too large to keep up with - currently rounding a catalog of more than 20 million songs - but also this content is very diversified. For both these reasons, millions of people throughout the globe have difficulties finding musical content according to their individual preferences.

Over the years, researchers and companies have addressed this problem by developing solutions to propose content to users using different recommendation strategies. These solutions are known as recommender systems.

However, these systems focus the majority of their strategies on musical similarity approaches, which only use music content and context, such as filtering per musical genre, similarity between artists, audio acoustic features extraction, among others. This type of approach helps reduce the issues related with the large amount of content and its diversity, but completely ignores the user context, e.g. the listening context and mood. The recommendation criteria that focus solely on music similarity is subjective, as it may or may not meet the user’s expectations. A recommendation strategy for a group of users within the same context will hardly please all of them. A recommendation strategy for a unique user in different contexts will probably produce different reactions.

Therefore, the overall goal of this research is the development of a music recommendation system that focuses its recommendation strategies on emotions, suggesting music to users according to their context and, subsequently, increasing the possibility of meeting users’ expectations.

Additionally, a second research aim is to make a comparison between both recommendation strategies based on music similarity and on the user context.

(20)

Introduction

2

1.2 Thesis Outline

This thesis consists of four chapters, including this one, which serves as an introduction to the overall context and motivation for the thesis.

Chapter 2 is divided into four major points and provides a state of the art of music emotion recognition in music recommendation systems. The first point, Study of Music and Emotion, reflects on the idea that music induces and/or expresses emotions and provides a general analysis of this relation on different fields. This point also lists several techniques to measure emotions and describes models of emotion that can be used in the context of music description. The second point, Recommender Systems, describes the way they are applied on a commercial context and also on research, considering diverse computer science fields. The third point, Music Recommendation Systems, takes into consideration the previous point and adds a new layer of knowledge by focusing on the music domain. The last point, Music Emotion Recognition on Recommender Systems, enforces the concept of music emotion recognition and considers the subject within recommender systems.

Chapter 3 describes the features and development stages of one of the goals of this research, the music recommendation system.

Chapter 4 describes the experiment that was performed and how the data was collected. It also reflects and discusses the obtained results, presenting a final conclusion. Finally, several suggestions on how this work may be improved and continued are proposed.

1.3 Thesis Contributions

The contributions are divided into three major topics:

1. Development of a music recommendation system that uses emotions to make recommendations of groups of songs. The system is the platform that hosts the recommendation algorithms and strategies.

2. The mentioned recommendation system includes two types of recommendation strategies:

 The first type and the one that this research proposes, is a recommendation strategy that orders songs by music annotation proximity on the Valence-Arousal emotion model.

 The second type is based on music content-based similarity and applies psychoacoustic models, such as Mel Frequency Cepstral Coefficients (MFCCs) extraction, which takes into account timbral features. By featuring a standard recommendation approach usually available on music recommendation systems, it is possible to proceed with a comparison between both strategies.

(21)

Introduction

3

3. By developing a music recommendation system and recommendation strategies that use emotions, this research contributes with solutions that takes into account the user context.

(22)

(23)

Chapter 2 State of the Art

2.1 Study of Music and Emotion

Since the early beginnings of mankind, music has had a major role in society, as we know it. No matter the context, being it cultural, social or just referring to geographical localization, music is celebrated and experienced by many on our daily lives - e.g. on events such as parties, weddings, funerals, and activities such as exercising, walking, working, etc. Its huge impact on society and the way people live is obvious, although not recognizable at all times. This impact starts from within each individual, as music directly influences the emotional state of people, being able to make us feel sad, nostalgic, happy, energetic, among others (Thompson and Quinto 2011). The latter idea is studied by researchers in the field of cognition and neuroscience, who actually state that emotions are one of the primary reasons for people to listen to music, i.e. music can express and induce emotions. Music expresses emotions when the listener associates emotion to a certain piece or passage, e.g. this piece is sad. Music induces emotions when the listener feels sad while or after listening to the piece (Juslin and Sloboda 2011).

The idea that music expresses and/or induces emotion is quite old and a subject of discussion in distinct fields, such as Philosophy, Musicology, Neuropsychology, Anthropology, Sociology and Psychology (Juslin and Sloboda 2011). For instance, philosophers discuss how musical works, being non-sentient, are able to express emotions; or, why do people like to listen to music that makes them feel sad (Davies 2011). On the other hand, in the discipline of Musicology, the relation of music and emotion is a matter of study that goes as far as into the ancient Greece and the appearance of opera around 1600 (Cook and Dibben 2011). In Neuropsychology, the discipline that analyses the association between neural mechanisms and mental functions, a biological perspective is adopted, trying to understand if, in fact, music is a biological function, and subsequently, examining if music emotions are biologically defined

(24)

State of the Art

6

(Peretz 2011). In Anthropology, the focus is on the relation between a musical event, the performer and the audience, questioning and trying to measure the importance of culture in the listeners’ affections while at that event (Becker 2011). As a subcategory of Sociology, Music Sociology focuses on the role of music on society and on how music influences people’s behavior (DeNora 2011). Last but not least, Psychology seeks to define what an emotion is and how it can be measured, focusing on people’s daily feelings, behaviors and bodily reactions (Juslin and Sloboda 2011).

In the psychological approach, researchers look to define what mechanisms occur while people listen to music and, subsequently, what mechanisms allows them to experience or identify an emotion related with a particular musical piece (Juslin and Sloboda 2011). However, to achieve this particular goal, the main question that Psychology asks is: What is an emotion? The biggest obstacle trying to answer this question is related with the inherent difficulty to describe and measure an emotion, as psychologists have failed to reach a consensus on its definition (Juslin and Sloboda 2011). Although open to discussion, in the attempt to define an emotion and considering that each person lives on a permanent affective state (Ekman and Davidson 1994, Barrett, Mesquita et al. 2007, Izard 2009), the majority of psychologists differentiate between ‘emotion’ and ‘mood‘, describing the first as a strong reaction to a certain event (or events) that occurs in an external or internal environment (Beedie, Terry et al. 2005, Juslin and Sloboda 2011), and the second as a less powerful response to a certain stimulus (Juslin and Sloboda 2011).

Besides differentiating ‘emotion’ and ‘mood’, there’s also a distinction between expressing and perceiving emotions, notions that are directly connected to the ideas of emotivism and cognitivism. Proposed by philosophers, these ideas consider that music is able to simply express emotions and also induce emotional responses on listeners. However, this does not mean that music can produce an emotional response on listeners all the time, as in most cases only perception is accomplished by the listener. The distinction between each idea is made based on the nature of the emotional response from listeners. ‘Emotivists’ defend that music provokes real emotional reactions in listeners. ‘Cognitivists’ that music simply represents emotions that are perceived by listeners (Russell 2003). Despite the conceptual distinction, many researchers often confuse perceived emotions with induced emotions, as music may express emotions that are perceived by listeners (Scherer and Zentner 2001).

Although not completely defined, there are many factors and variables that may be involved in the processes of perception and/or induction of emotions in music, namely: 1) The acoustic structure and characteristics of the sound sequence, meaning the music duration, amplitude, frequency, timbre, harmonic structure, among other properties that, united as a whole, can constitute a mechanism for inducing emotion; 2) The execution, referring to the way music is performed by an artist, takes into account the performer’s identity, ability and performance state. This factor is quite relevant and has impact in the perception and induction of emotions, by and into listeners respectively; 3) The listener, as his or hers individual and

(25)

7

sociocultural identity is essential to the way emotions in Music are perceived and induced; 4) The context, as for both the performer and/or listener, the location, the environment and the surrounding materials are fundamental for the way they perceive and/or induce emotions through Music. As examples, a concert venue, a church, a car, listening with headphones, the existence of sounds that may disturb the process of listening and/or performing, events such as funerals and weddings, among many others, are different variables associated with the listener’s and/or performer’s context (Juslin and Sloboda 2011).

As ‘emotivists’ state, music produces emotional responses on listeners (Scherer and Zentner 2001). Over the years, several researchers have concluded that there must be at least one mechanism by which an emotional reaction to music occurs (Scherer and Zentner 2001). In 2008, Juslin and Västfjäll proposed a model that specifies six mechanisms by which music is able to produce emotions (Juslin and Västfjäll 2008): 1) brain stem reflexes, when the brain stem signals a relevant event related with the acoustical characteristic of the music; 2) evaluate conditioning, which refers to the idea that an emotion may be induced based on the repetition of a musical stimulus in connection with a positive or negative event; 3) emotional contagion, when a listener perceives an emotion associated with the music and that perception is then translated to the induced emotion; 4) visual imagery, referring to when a listener associates a visual image with the music that is being listened at the same time; 5) episodic memory, concerning the emotional response that is induced into a listener when listening to music that recalls certain emotive past events; and, 6) musical expectancy, related with the listener’s expectations regarding the music being listened (Scherer and Zentner 2001). The idea behind this theoretical framework is that researchers will be able to understand better the mechanisms that contributed to the production of an emotional response to music. Each mechanism do not coexist individually, as it may be necessary to cross the studies of several mechanisms in order to explain the perceived emotions (Juslin and Västfjäll 2008).

As specified above, one of the difficulties to explain an emotion is related with the ability to measure it. Some might say that measuring emotions is not scientifically possible, as feeling or perceiving emotions is subdued to subjectivity. However, as people’s emotionally reactions vary between each other when facing a certain musical stimulus, these reactions are then valid as subjects of scientific study (Zentner and Eerola 2011). That being said, emotions can be measured based on the conjunction of three forces: feelings, behaviors and bodily reactions (Juslin and Sloboda 2011). Directly related with each one of the specified forces, there are three ways to measure emotions: self-report, behavioral and psychophysiological measures (Juslin and Sloboda 2011).

As it is difficult to identify if a person is having some kind of psychological reaction when (or after) listening to a certain musical piece, the majority of studies are based on self-report measures, as the only way to measure the listener’s emotional state, regarding the specified music, is to ask them (Scherer and Zentner 2001). However, this kind of measurement may be insufficient to conclude that a person is actually experiencing a specific emotion. Also, the way

(26)

8

a self-report is conducted by the researcher may generate, in the listener, some confusion between the notions of perception and expression of emotions (Juslin and Sloboda 2011). According to several studies, listeners can distinguish between induced and perceived emotions depending on how data is collected. The tests conducted in those studies implied asking participants to answer two questions at once: what emotions they perceive and feel while listening to music. With this approach, the number of misleading and confused answers dropped substantially (Juslin and Sloboda 2011).

Movement has always been considered as a major factor when discussing emotional responses to music. There’s evidence that people have a tendency to coordinate movements when listening to music, e.g. tapping foot, clapping, dancing, fidgeting (Juslin and Sloboda 2011). Nowadays, people listen to music in concert venues, walking down the street with headphones and in intimate environments, such as their houses, cars, etc. The social context entices people to have expressive behaviors when confronted with a certain musical stimulus. However, it’s important to realize that much music was actually created with the goal to generate an expressive behavior on people, no matter the context (Scherer and Zentner 2001). Despite the huge amount of social events, such as concerts, that definitely constitute a proper environment for the realization of research based on behavior observation, psychologists have neglected these contexts overall (Juslin and Sloboda 2011).

Besides the measurement of feelings and behaviors, another indicator, based on bodily reactions, can be used for measurement purposes, as music is able to generate physiological reactions derived from emotions. Music can have a significant impact on our body, provoking involuntary changes in our nervous system, heart rate, muscular tension, blood pressure, skin temperature, among others (Scherer and Zentner 2001). Music that induces arousal has a tendency to produce bodily reactions, such as an increase of heart rate and muscular tension. On the other hand, music that induces calmness results in a decrease of heart rate, blood pressure and stress (Scherer and Zentner 2001). To measure these emotional reactions, specific instruments and devices can be used, such as an ECG (electrocardiogram), which measures the heart and pulse rate; EGG (electrogastrogram), which measures gastric motility; Facial EMG, which measures zygomaticus activity; FPA (finger pulse amplitude), which measures blood volume; among many others related with other bodily reactions, such as respiration, skin conductance, facial expressions and head movements (Hodges 2011).

Normally, researchers apply a specific method in their studies. However, other studies measure emotions based on a combination of more than one method, although crossing all methods at once is a rare phenomenon. As an example, only applying self-report technics may generate an analysis based on perceptual judgment (Scherer and Zentner 2001). To avoid unfounded conclusions, in order to distinguish between perceived and induced emotions, it may be more reliable to use more than one type of measure at once (Juslin and Sloboda 2011).

(27)

9

Measures and models of emotion are intrinsically connected, as self-report instruments are derived from two models of emotion: 1) discrete or basic emotion theory; and, 2) dimensional model of emotion (Zentner and Eerola 2011).

Discrete or basic emotion theory argues that there is a group of identical and innate emotions in all people and that all other emotions are sub-categories of these. This group includes fear, anger, sadness, happiness and disgust (Ekman 1992). Earlier studies focused on the categorical approach to categorize emotions. This approach is based on the presumption that every emotional event is associated with a specific category of emotions, which on the other hand contains a set of derived emotions (Tomkins 1962, Izard 1977, Lazarus 1991, Ekman 1992, Plutchik 1994, Power and Dalgleish 2007). In 1936, Hevner proposed a model, perhaps the most well known within this approach, named Hevner’s adjective circle (figure 1), that organizes sixty-six emotional labels in a total of eight clusters (Hevner 1936). Hevner’s model and other similar models – e.g. Fansworth proposed ten mood clusters instead of eight (Farnsworth 1958) - are relevant as they provide researchers, an organized and global framework of emotional labels that can be applied to their studies. However, some criticism regarding the categorical approach, points out that this is a limited model, as it’s difficult to represent all broad of emotions that humans have the capacity to recognize (Yang and Chen 2011).

As an alternative, a dimensional approach was taken. New models, able to classify emotions taking into account their distribution over one or more dimensions of affection (Juslin and Sloboda 2011), were proposed. Several were considered but the most relevant and applied is Russel’s two-dimensional circumplex model (Russell 1980). As seen in figure 2, the model is two-dimensional, featuring the dimensions of arousal and valence, and has twenty-eight emotional labels distributed in a circular structure over its four quadrants. Russel’s circumplex model of emotion characterizes emotions as being able to distinguish themselves between each other and explores the idea that emotions can be bipolar, as two emotions can be considered the opposite of one another, e.g. happy vs. sad (Kim, Schmidt et al. 2008, Juslin and Sloboda 2011). Some negative analysis, considers that the model creates some confusion between emotions that are very close to each other, e.g. afraid and annoyed, as both emotions are very distinct from one another but also highly related within the circular structure (Juslin and Sloboda 2011). Other negative point states that it’s possible to have negative and positive emotions at the same time, thus making it improbable the existence of the dimension of valence in such terms (Juslin and Sloboda 2011).

(28)

10

Figure 1: Hevner’s adjective circle (Hevner 1936). Source: (Caetano and Wiering 2012)

Despite all of the above described mechanisms and pointed factors for music to express emotional responses, one fundamental and not yet discussed particularity, is the self-conscious musical choices made by listeners. The fact is that people select music according to certain functions or actions, e.g. music to listen to while travelling, working, walking down the street, cooking, exercising. The reasons vary but people can choose to hear certain songs because they are bored, lack energy, to increase focus, or even to attribute meaning to certain events (e.g. funerals, weddings, etc) (Juslin and Sloboda 2011). That being said, it seems fair to conclude that the goals and motivations of a listener are extremely important to obtain an emotional response from him or her to music, even when that emotional response was not intended (Juslin and Sloboda 2011).

(29)

11

Figure 2: Russel’s circumplex model of Affect (Russell 1980). Source: (Caetano and Wiering

2012)

2.2 Recommender Systems

Currently, almost every Web application that bases a business model on content directed for a specific community (e.g. Facebook1_{and Twitter}2_{) or that sells products online (e.g.}

e-commerce platforms such as Amazon3_{, Alibaba}4_{and eBay}5_{), display recommendations of items}

on their platforms (Jannach 2008). To provide this functionality, these platforms integrate certain tools and techniques that are able to suggest items that are somehow relevant to users. These tools and techniques are designated as Recommender Systems (RSs) (Resnick and Varian 1997, Burke 2007, Mahmood and Ricci 2009). The idea of “item” is related with the recommendation itself and it can vary on type (e.g. music, movie and news) (Ricci, Rokach et al. 2011).

RSs are based on a simple logic: people value other people’s recommendations, e.g. people tend to watch movies with good reviews or go to restaurants that friends recommended.

1_{https://www.facebook.com} 2_{https://www.twitter.com} 3_{https://www.amazon.com} 4_{https://www.alibaba.com} 5_{https://www.ebay.com}

(30)

12

However, despite the obvious conceptualization, RSs only appeared when e-commerce platforms had grown substantially and presented huge amounts of content to users without an apparent logical order. Users found it hard to decide which products should be bought. As a solution, these services implemented strategies towards personalization of content to users and, to do this, they began to save the users’ information and feedback (Ricci, Rokach et al. 2011).

Normally, recommendations are generated taking into account user’s individual profiles, as the proposed content is personalized, varying between users within the same platform. RSs’ algorithms are built to understand and predict what information should be suggested to a user in a certain context and, all of this, while relating all data that the system has gathered from the user itself, i.e. the user’s preferences. On the other hand, recommendations can also be non-personalized, meaning that the focus is not on users’ profiles but on the content of the platform itself (e.g. music webzines) (Ricci, Rokach et al. 2011).

There are many reasons and benefits that come with the existence of a RS in a service and it can be analyzed from two different perspectives: the service provider perspective and the user perspective. From the service provider perspective, having a RS has plenty of advantages, such as increasing the items sales and having more sales of diverse items; better experience and satisfaction from users which, subsequently, improves user fidelity to services; and, ability to improve business strategies based on the knowledge of the user’s preferences. From the users perspective, RSs allow them to have access to better and various items based on their preferences and goals; have recommendations based on the context in which he or she is in; have several items being recommended at once; have several items recommended as a package that fit as a coherent group; as well as others advantages (Ricci, Rokach et al. 2011).

RSs can also be considered a multi-disciplinary field, as research on several computer science fields, such as artificial intelligence, specifically machine learning and data mining, information retrieval and human-computer interaction, have deeply contributed to their development (Ricci, Rokach et al. 2011).

Several types of RSs are known to exist. Comparing the domain in which they are being used, the knowledge and the recommendation algorithm, can differentiate these. Six types of recommender systems are then defined: Content-based, Collaborative filtering, Demographic, Knowledge-based, Community-based and Hybrid (Burke 2007).

Content-based recommender systems try to understand what items should be proposed by knowing which items the user liked previously. To do this, the characteristics of the previously liked items are used as objects of comparison, as the recommended items are similar to them (Ricci, Rokach et al. 2011).

Collaborative filtering recommender systems suggest items based on users comparisons. If a user has similar preferences with other user, then the items that the latter liked will be suggested to the first (Schafer, Frankowski et al. 2007).

(31)

13

Demographic recommender systems suggest items based on the user’s demographic references, taking into account several profile data such as age, language, city, country, among others (Mahmood and Ricci 2007).

Knowledge-based recommender systems suggest items based on the knowledge of a specific domain. These systems try to recommend the best useful items to users depending on the context (Bridge, Göker et al. 2005, Ricci, Cavada et al. 2006).

Community-based recommender systems suggest items based on the idea that a user tends to like what their friends also like (Ben-Shimon, Tsikinovsky et al. 2007, Arazy, Kumar et al. 2009). These suggestions are generated after analysis of the activity of a user’s friends on social networks (e.g. Facebook6_{and Twitter}7_{). These community-based systems are usually}

denominated as social recommender systems (Golbeck 2006).

Hybrid recommender systems tend to be a result of a combination of the previously listed type of recommender systems. The goal of these systems is to apply the best features of each type to their advantage and, subsequently, trying to compensate and limit the known issues in each (Ricci, Rokach et al. 2011).

As specified above, Content-based recommender systems suggest items based on likes a user made in the past. Basically, the recommendation algorithm collects the features of the items the user rated and delineates a user model containing this information. The user model is a reflection of the user’s preferences (Mladenic 1999). When analyzing an item, the algorithm compares the item features with the user model features and, if the system detects a relevant match, the item is designated as interesting to the user (Lops, De Gemmis et al. 2011). For this recommendation process to be succeeded, three components must be considered: 1) Content Analyzer - responsible for analyzing the data from items and decoding its features. The resultant information from the analysis will be used in conjunction with the next two components; 2) Profile Learner - in which the user’s preferences are recognized and the user model is formed; 3) Filtering Component – in which the user model features are compared with the features of the items available for recommendation, resulting in relevant suggestions for the user (Lops, De Gemmis et al. 2011). After the recommendation process finishes, the user’s profile is updated by collecting and storing their own feedback, which can be of two types: positive (i.e. the features the user liked) and negative (i.e. the features the user neglected) (Lops, De Gemmis et al. 2011). The saved feedback will then be used on the next iteration and the process is repeated all over again.

To collect and store the user’s feedback, two techniques can be applied: explicit and implicit feedback. Explicit feedback depends on the user to rate the suggested items and, although it’s a quite simplistic and easy method to understand the user’s preferences, it really does not provide enough information about how exactly the user felt about the suggested items.

6_{https://www.facebook.com} 7_{https://www.twitter.com}

(32)

14

There are three ways to collect explicit feedback from users: like/dislike (Billsus and Pazzani 1999), ratings (Shardanand and Maes 1995, Pazzani and Billsus 1997) and text comments (Resnick, Iacovou et al. 1994). On the other hand, implicit feedback does not require a direct involvement from the user, as all information is collected and monitored by the system. This technique is a bit more complex but, in addition to not requiring a direct participation from the user, it collects more data from the user’s activities over the suggested items (Lops, De Gemmis et al. 2011).

The use of content-based recommender systems has advantages and disadvantages, especially towards other RS types, such as collaborative filtering (Lops, De Gemmis et al. 2011). First, content-based systems do not require the existence of other users’ likes, as they only depend on the activity of the user. Other advantage is related with the suggested items and their features, as these provide enough information to allow the understanding of how items are being recommended by the system. Lastly, a content-based system allows new items (i.e. items not yet liked by a user) to be suggested (Lops, De Gemmis et al. 2011).

On the other hand, the items’ features are limited in number, meaning that the combinations the recommendation algorithm can make are also limited. In addition, if the system operates over a specific domain (e.g. music or movies), then it requires having information about the domain. Only this way, it will be able to relate the items’ features with the users’ likes. Other disadvantage is the inability to deal with new items that do not match with the user’s preferences. Finally, content-based systems struggle to suggest relevant items if the user does not have a complete profile. This often happens to new users on the system (Lops, De Gemmis et al. 2011).

As explained above, items on content-based recommender systems contain information that is denominated as features, and the majority of these are basically textual information. This generates several problems related with the nature of the language itself, as user’s profiles are keyword-based, meaning the recommendation algorithm depends completely on string matches existent in the profile and also in the item. There are two possible problems derived from string matching that can affect the recommendation of items: 1) Polysemy - a word can have several meanings and, due to that, an item can be considered relevant when it may be not; 2) Synonymy - several words can have the same meaning and, subsequently, relevant items can be overlooked. To solve these problems, semantic analysis is being applied. Essentially, it consists on the idea of using knowledge bases and within these, relate items with user profiles, which will then generate a semantic logic that helps to define what should be suggested to a user (Lops, De Gemmis et al. 2011).

Another topic to be considered is the evaluation of recommendation systems. Selecting the correct recommendation system for an application is not as easy and direct as it may be supposed. As mentioned before, there are several types of RSs and within these, various and distinctive recommendation algorithms that can be applied. Selecting the correct ones depend heavily on the domain and goals of the application itself. Thus, in order to make a decision,

(33)

15

there’s the need to evaluate the recommendation algorithms, comparing the performance of each via experiments, which will then result in a ranked list. To make an accurate evaluation, there must be an awareness that users use recommendation systems not only to have items being suggested according to their tastes, but also because they want to find new and diverse items, as well as other value proposals that these systems (may) offer (Shani and Gunawardana 2011).

There are three types of experiments that can be made to evaluate a RS: offline, user studies and online experiments. For each, it is important to pursue three guidelines: 1) Hypothesis - each experiment must be based on an hypothesis; 2) Controlling variables - all variables must be the same when comparing recommendation algorithms; 3) Generalization power - it is important to use diverse datasets, in order to reach conclusions that can be generalized beyond the limits of that data (Shani and Gunawardana 2011).

Of the three experiments, offline experiments are the simplest to conduct. They do not require the existence of real users, as the datasets used already contain a sample collection of users’ ratings. These datasets permit the simulation of users’ behavior on a RS and, therefore, allowing the comparison of different algorithms. Not requiring real users make these experiments less expensive than the others, however, they are limited in terms of produced responses. Offline experiments are definitely relevant when trying to predict the accuracy of an algorithm and, subsequently, limiting the number of algorithms that should be considered on a system. Hence, offline experiments can be executed first and then the resultant algorithms can be tested with user studies or online experiments (Shani and Gunawardana 2011).

Performing user studies experiments do not depend on doing offline testing, even though they can complement these by providing more information about the RS. User studies are chosen to make performance tests with real users and in a real interaction environment, not a simulated one. Thus, user studies experiments have greater value than offline experiments, as they are more reliable and complete when testing recommender systems performance (Shani and Gunawardana 2011). The experiments consist in gathering a sample of test subjects and request them to interact with the recommender system. Each test subject completes the same tasks. At the same time, researchers observe the experiment and collect data based on the users’ behavior that occurs during the process. Very often, in order to inquire subjects about the system itself and collect qualitative data, these experiments are completed with questionnaires. The collected data, both quantitive and qualitative, will then be used for posterior analysis of the system performance (Shani and Gunawardana 2011). The advantages of user studies experiments are clear. They allow the collection of more and more diverse data in tests in which the interaction is done with real users. Nonetheless, it is important to consider some disadvantages as well. User studies experiments are more expensive than offline experiments, as there is the need to gather a group of test subjects, that may be paid or not, and that are usually short due to this. Additionally, the process may be time consuming, limiting tests to a small group of interactions. Also, in order to be ready for final tests, these experiments must first be tested and fixed. Lastly, the recruitment of the test subjects is of great value, as these must

(34)

16

represent, as closely as possible, the users of the recommender system (Shani and Gunawardana 2011).

The third type, online experiments, is the most reliable and consistent when it comes to comparing recommender systems. Such as user studies, online experiments also use real users. However, those users perform real tasks in a final and public environment. The combination of both factors, real users in real online systems, allows the concretization of more effective tests and evaluations of performance between systems. Online experiments present several recommendation choices on the system and then collect all possible data resultant from users’ interaction. Measuring this will determine which recommendation algorithm is more accurate. For an online testing system to perform correctly, it is important to consider the following possible issues: 1) users tend to abandon the system if recommendations have no value for them; 2) the system’s interface is of great value as it will have a great impact on users perception and will also dictate if they return to it; 3) users must have items suggested to them on a randomized fashion, if not, tests will not be fair between different systems (Shani and Gunawardana 2011).

Recommender systems operate over several domains, sometimes over a specific domain (e.g. movies and sports news) and sometimes over a mixture of different domains (e.g. on e-commerce platforms). The items that are suggested to a user by a system are contextualized in the domain(s) of the application (Burke and Ramezani 2011). In the next section of this document, we will discuss recommender systems in the music domain.

2.3 Music Recommendation Systems

As previously explained, music is fundamental on peoples’ lives and has a lot of influence in society nowadays (Thompson and Quinto 2011). In fact, it is known that people listen to music more often than they watch movies or television (Rentfrow and Gosling 2003). The arrival of digital music and the emergence of Web 2.0 technologies created an explosion of musical content on the Internet, content that is now available to millions of people all over the world (Aghaei, Nematbakhsh et al. 2012). With such a huge amount of information to be dealt with and that is constantly growing at a quick pace, it’s impossible for anyone to keep up with such a rhythm and be aware of the latest trends in music, or even to find relevant content according with their tastes and goals (Braunhofer, Kaminskas et al. 2013).

In order to solve this problem, there’s the need to organize this amount of musical content in such a way that it becomes recognizable to all (Pachet and Aucouturier 2004). Currently, the music catalog is somewhat well organized, as research in music information retrieval was able to propose solutions to several problems, including artist identification and instrument recognition (Marques and Moreno 1999, Mandel and Ellis 2005), and genre classification (Tzanetakis and Cook 2002, Tsunoo, Tzanetakis et al. 2011, Song, Dixon et al. 2012). However,

(35)

17

such an organization is not enough to attend the tastes and goals of users. To accomplish this, several tools and mechanisms, able to filter and suggest music to users in a personalized manner, were developed. These are denominated as music recommendation systems (Whitman 2005, Song, Dixon et al. 2012). These systems are focused on recommendation strategies that vary between each other and, based on users’ preferences, should be able to suggest ordered lists of musical items – playlists (Herrada 2009, Song, Dixon et al. 2012).

At the moment, many well-known and successful commercial systems have music recommendation on their services, such as Pandora8_{, Spotify}9_{, Last.fm}10_{, Amazon}11_{, Shazam}12_,

Allmusic13_{, Rdio}14_{and so on.}

The success of music recommender systems depends on the ability to suggest relevant items to users. Three elements are essential for an efficient music recommendation system: 1) users’ profiles and the differences between them, as these will allow the understanding of each user’s tastes and preferences; 2) music items containing properties that will be used on recommendation algorithms; 3) the matching relation between a user profile and an item (Song, Dixon et al. 2012).

To improve the capacity of prediction, music recommender systems must collect and store much information from users and also their musical preferences. In order to generate a user profile, the majority of music recommendation systems ask users to provide feedback in two different approaches: explicit feedback, in which the user rates, likes or blocks songs, artists or albums; and, implicit feedback, in which the system collects user information regarding purchases and listening activity (Laplante 2014).

However, user feedback is insufficient. Research demonstrates that, if a music recommender system depends on the user’s profile to predict what musical items to suggest according to his or hers tastes, then it should be fair to refer that there’s an association between music preferences and information about the user itself, such as demographic data (e.g. age, gender and nationality), personal characteristics (e.g. religious views, political orientation, life style and personality traits) and social factors (e.g. friends and family) (Song, Dixon et al. 2012, Laplante 2014). This notion of dependency between users’ profiles and information about the user itself on music recommender systems determines the need to generate user models. These can be separated in two: user profile modeling and user experience modeling (Laplante 2014).

According to the first model, three specific domains categorize the user profile: demographic (e.g. age and gender), geographic (e.g. location) and psychographic (e.g. mood, interests and personality) (Rentfrow, Goldberg et al. 2012, Laplante 2014). The second model,

8_{https://www.pandora.com} 9_{https://www.spotify.com} 10_{https://www.last.fm} 11_{https://www.amazon.com} 12_{https://www.shazam.com} 13_{https://www.allmusic.com} 14_{https://www.rdio.com}

(36)

18

takes into account the level of experience a listener owns in terms of music, stating that this determines users’ expectations (Laplante 2014). Four types of users were defined according to their level of music expertise: savants (i.e. music fanatics), enthusiasts (i.e. love music but have other relevant tastes), casuals (i.e. listen to music causally) and indifferents (i.e. don’t really care about music) (Coulangeon and Lemel 2007).

Other research mentions that there is more user information that may have an impact on user modeling. For example, some music recommendation systems should consider the listener’s behavior while listening to music, storing information about how many times each music track is played by the user, the diversity of music selected and if the user prefers popular or non-popular music (Farrahi, Schedl et al. 2014).

Music items are just as important as users’ profiles in recommender systems. These items contain music metadata that is used in information retrieval. Music metadata is categorized in three parts: editorial, cultural and acoustic (Pachet 2005). Editorial metadata on music items is the data related with the domain of the item itself (e.g. artist, title, album and genre) and the editor is the one that gets it. On the other hand, cultural metadata is a result of textual information analysis based on the comparison of similar music items. Lastly, acoustic metadata is related with data resultant from audio signal analysis (e.g. pitch and tempo) (Song, Dixon et al. 2012).

There are several approaches in music recommendation taken by existing music recommender systems: metadata information retrieval, collaborative filtering, content-based information retrieval, emotion-based model, context-based information retrieval and hybrid (Song, Dixon et al. 2012).

Metadata information retrieval approach uses editorial metadata made available by editors to be able to search relevant music tracks. As an example of the types of textual metadata used, songs and albums titles, as well as artist names, are the most common (Downie 2003). However, despite being a method that is accurate and standard, several issues can be identified, such as: 1) in order to be able to find specific songs, users must be aware of their metadata information; 2) with the constant grow of the music digital catalog, editing such an amount of metadata information, can be a painful task when referring to the time it occupies; 3) most important, items are only suggested according to their metadata, as users’ preferences are not taken into account (Song, Dixon et al. 2012).

Collaborative filtering is probably the most used technique. It compares the behavior of users within the system and the items they rate, like or play. If the behavior and the rated items are similar between users, then the items to propose to each are predicted based on this (Balabanović and Shoham 1997, Resnick and Varian 1997). Collaborative filtering is categorized in three: memory-based, model-based and hybrid (Sarwar, Karypis et al. 2001, Su and Khoshgoftaar 2009). Memory-based technique is based on lists of users’ ratings to items that are collected by the system. It tries to predict and suggest items based on users who have a good amount of rates made (Burke 2002). Model-based approach is based on the user’s tastes

(37)

19

and collected data. Using machine learning and data mining techniques, the system will apply these algorithms with the notion that the user model learns and evolves in parallel with the usage made by the user (Adomavicius and Tuzhilin 2005). Hybrid collaborative filtering combines both of these models and, regularly, it obtains better recommendation results with this approach, in comparison with each model individually (Wang, De Vries et al. 2006).

Several drawbacks are known to exist in collaborative filtering, especially with the assumption that if a user rates the same items or has a similar behavior as other users in a system, then they must have the same preferences and goals in a music recommendation system (Song, Dixon et al. 2012). There are three issues in this approach: 1) Cold start – a user or system with a small amount of ratings is not very efficient; 2) Popularity bias – most well known music tracks are also the ones with the majority of ratings, making it hard to suggest non-popular, but relevant, items; and, 3) Human effort – first, it requires users to rate on a regular and constant pace and, second, users who rate the most, have the biggest impact on the system, which is not necessarily the most effective strategy (Herlocker, Konstan et al. 2004).

Content-based information retrieval approach recommends items similar to others that the user listened to in the system previously (Aucouturier and Pachet 2002, Logan 2004). The items comparison is made via features that each item, i.e. music track, contains. To obtain these features, an analysis of each track is required (Li, Kim et al. 2004, Adomavicius and Tuzhilin 2005). Based on these features, it is possible to measure the similarity or distance between one or more than two tracks (Logan 2004), using measurement methods, such as K-means clustering with Earth-Mover’s Distance (Logan and Salomon 2001), Expectation-Maximization with Monte Carlo Sampling (Pachet and Aucouturier 2004) and Average Feature Vectors with Euclidean Distance (Chordia, Godfrey et al. 2008).

Although complementary to the collaborative filtering approach, as it does not require human effort, it is not scientifically proven that the content-based information retrieval approach is representative of the users’ preferences and goals in a music recommendation system. Also, this technique depends on the amount of features extracted from song tracks. The bottom line is that the content-based model strives in the field that the collaborative-approach delivers performance (Song, Dixon et al. 2012).

As explained above, music can express emotions. Recently, some research has started to study music emotion recognition and how it can be applied on recommendation systems (Kim, Schmidt et al. 2010, Panda and Paiva 2011). These systems use models of emotion that represent and propose a classification of emotions, such as Russel’s circumplex model (Russell 1980). The emotion-based model approach associates emotions with specific acoustic features, such as tempo, rhythm, time, energy and harmony (Wang, Chen et al. 2011).

Adding to the fact that the research in this field is limited, the emotion-based model has a few problems to consider. The music recommendation system requires a complete and accurate dataset to suggest reliable items. Fulfilling these datasets is time consuming and requires manpower, increasing a lot the human effort (Skowronek, McKinney et al. 2006). Also,

(38)

20

classification of emotions, although existent, is subjective, as the very definition of emotion also is. That being said, the suggested items may produce different affective experiences on users, making it hard to improve prediction (Yang and Lee 2004, Shao, Li et al. 2008).

Context-based information retrieval approach collects and uses information about the activity of users on social networks (e.g. Facebook15_{and YouTube}16_{). The algorithms use social}

information, taking into account users’ comments, posts, rates, likes, shares, tags and the activity of friends (Lamere 2008, Su, Yeh et al. 2010). The main drawback of this model is related with popularity bias, i.e. that the majority of suggestions focus on popular music content (Eck, Lamere et al. 2008).

Lastly, the hybrid approach combines at least two of the detailed above models. There are not specific limitations to identify, as a hybrid system will inherit the drawbacks of all the models it encapsulates. However, it is fair to refer that, being a combination of several approaches, a hybrid system is more efficient than each one of the models individually (Balabanović and Shoham 1997, Yoshii, Goto et al. 2006, Yoshii, Goto et al. 2007).

Although not primary, some other limitations in music recommendation systems must be considered. These are not related with the recommendation algorithms and the above-described approaches, but do have an impact on the user’s experience while listening to music tracks suggested by a system. Three issues are presented: 1) Dynamic Evolvement – incapacity of the system to consider the insertion of new data, user or item, properly. The dynamic evolvement of a system is vital to continue to suggest relevant items according to users’ tastes (Balabanović and Shoham 1997, Hu and Ogihara 2011); 2) Playlist Generation – the sequence of items suggested to a user are very often inefficient, as the relevance of the original item, in which the recommendation started, becomes obsolete during the process. The result is a misrepresentation of the recommendation theme (Lee, Bare et al. 2011). To solve the issue, some have referred that a playlist requires a central theme (Cunningham, Bainbridge et al. 2006). Others proposed that skipping items should make these items irrelevant to the recommendation algorithm (Pampalk, Pohle et al. 2005, Chedrawy and Abidi 2009); and, 3) User Interface Design – this a known issue to any commercial application, not just on music recommendation systems. User experience and user interface design have a huge influence on how people experience and evaluate systems, no matter the purpose and efficiency that these may have (Song, Dixon et al. 2012).

In the next section of this document, music emotion recognition and how it affects music recommendation systems will be discussed.

15_{https://www.facebook.com} 16_{https://www.youtube.com}

(39)

21

2.4 Music Emotion Recognition on Recommender Systems

The majority of recommender systems rely on data descriptors to suggest items to users and to model user’s profiles. However, a considerable amount of new research, made over the last few years, focus on the study of emotion recognition on recommender systems. The main reason for such an increase of research work is due to the continuous improvements in the area of affective computing, specifically on automation emotion detection techniques. These systems are now able to model and improve user profiles with information related with emotion recognition, not only with descriptors such as genre or age. However, it is important to refer that the fields of recommender systems and affective computing have been mostly studied independently (Tkalcic, Kosir et al. 2011).

Emotional reactions from users may occur during an interaction in an application, having it a recommender system or not, being it contextualized in the domain of music or not. These reactions may happen due to the fact that interactive processes are always producing stimulus, being these visual, auditory or other. These emotional reactions may have an impact on users when it comes to deciding which item to select and, subsequently, affecting the content that will be suggested by the recommendation system (Kahneman 2003). Thus, it is fair to conclude that capturing and collecting emotion information is useful for recommendation systems (Tkalcic, Kosir et al. 2011).

The same analogy is valid for music recommender systems, as in this specific domain, research has also increased recently and the topic is receiving a lot of attention. As discussed before, music is a vehicle to express emotions (Juslin and Sloboda 2011), although the relation of emotions with songs can be highly subjective and difficult to measure. Acoustical features of songs (e.g. duration, harmony and timbre) have an impact on people’s affective state (Kim, Schmidt et al. 2010), therefore, music emotion recognition can contribute to an improvement of user modelling within music recommendation systems.

Music information retrieval systems use model of emotions that classify these and they can be of two types, categorical or parametric. The first type, categorical, organizes emotions in several clusters containing emotional labels (Kim, Schmidt et al. 2010). Probably the most relevant is Hevner’s adjective circle (Hevner 1936). The second type, parametric, tries to classify emotions in multidimensional models, in which descriptors are placed continuously. The most relevant study is Russel’s circumplex model of affect, in which emotions are distributed on a bidimensional space (Russell 1980).

Tkalcic et al propose a framework divided into three stages of user interaction in a recommender system influenced by emotions: 1) entry stage; 2) consumption stage; and, 3) exit stage (see figure 3) (Tkalcic, Kosir et al. 2011). The entry stage refers to when the activity of the user within the recommender system initiates. At this initial point, the user enters with an entry mood, a mood that the system cannot predict, as it is related with events that occurred previously to this action. The entry mood will have an impact on the user’s choice when items

(40)

22

are being suggested by the system (e.g. a user may prefer to listen to certain content, depending on if he or she is happy or sad). Thus, detecting the entry mood by the system and using this information on the recommendation algorithm is essential (Tkalcic, Kosir et al. 2011). The second stage, consumption, is related with emotional stimulus that a user receives when consuming suggested items. There are two types of responses: 1) single values - values related with media that do not vary in time (e.g. an image); and, 2) multiple values - related with information that vary in time and, therefore, emotions will also vary (e.g. listening to a song or watching a movie) (Tkalcic, Kosir et al. 2011). In the exit stage, the user is experiencing a specific mood, called exit mood. The latter will influence the next steps and decisions of the user within the system and, therefore, will become the entry mood of the next content that will be consumed. The exit mood information is important to understand the level of satisfaction a user has towards a certain item (Tkalcic, Kosir et al. 2011).

Figure 3: Framework: Role of emotions in user interaction with a recommender system.

Source: (Tkalcic, Kosir et al. 2011)

User’s moods on recommender systems can be detected in two manners: implicit and explicit detection. Implicit detection is made via sensors, using EEG, ECG, video and voice recording, among other technologies. These sensors are usually applied on the human body and can measure and collect certain body indicators depending on the technology functionality (e.g. skin conductance, heart rate and blood pressure). On the other hand, explicit detection is made via direct input made by the user, such as tagging or questionnaires. Although being more