ContentslistsavailableatScienceDirect
Expert Systems With Applications
journalhomepage:www.elsevier.com/locate/eswa
Combining novelty and popularity on personalised recommendations via user profile learning
Ricardo Mitollo Bertani
a, Reinaldo A. C. Bianchi
b, Anna Helena Reali Costa
a,∗aEscola Politécnica, Universidade de São Paulo, São Paulo, Brazil
bDepartment of Electrical Engineering, FEI University Centre, São Bernardo do Campo, Brazil
a rt i c l e i n f o
Article history:
Received 12 September 2019 Revised 22 November 2019 Accepted 18 December 2019 Available online 26 December 2019 Keywords:
Recommender systems Machine learning Data sparsity
Diffusion-based algorithms User profile
a b s t r a c t
Recommendersystemshavebeenwidelyused bylarge companiesinthee-commerce segmentas aid toolsinthesearchforrelevantcontentsaccordingtotheuser’sparticularpreferences.Awidevarietyof algorithmshavebeenproposedintheliteratureaimingatimprovingtheprocessofgeneratingrecom- mendations;inparticular,acollaborative,diffusion-basedhybridalgorithmhasbeenproposedinthelit- eraturetosolvetheproblemofsparsedata,whichaffectsthequalityofrecommendations.Thisalgorithm wasthebasisforseveralothersthateffectivelysolvedthesparsedataproblem.However,thisfamilyof algorithmsdoesnotdifferentiateusersaccordingtotheirprofiles.Inthispaper,anewalgorithmispro- posedforlearningtheuserprofileand,consequently,generatingpersonalisedrecommendationsthrough diffusion,combiningnoveltywiththepopularityofitems.Experimentsperformedinwell-knowndatasets showthattheresultsoftheproposedalgorithmoutperformthosefrombothdiffusion-basedhybridal- gorithmandtraditionalcollaborativefilteringalgorithm,inthesamesettings.
© 2020ElsevierLtd.Allrightsreserved.
1. Introduction
Duetothemassiveamountofinformationproducedeveryday by bothhumans andmachines,itbecameincreasingly difficultto selectthe mostsuitable contentin awide rangeof options.Rec- ommendersystems(RS)arebeingwidelyusedto helpusersdeal with large volumes of information available on the Internet, es- pecially in the search forthe mostrelevant content accordingto theparticularpreferencesoftheuser.Duetothegreatpopularity ofthiskindofsystemandthe needtoensure thatsuch toolsof- ferrecommendationswithhighqualityandrelevancetotheuser, thealgorithmsthatgeneratetheserecommendationsmustbecon- stantlyimproved.
According to Zhang,Yao, andSun(2017), a RScan be consid- ered a useful information filtering tool for users aiming at dis- covering products or servicethey mightbe interestedin and its main applications include the recommendationof movies, songs, books,documents,websites,touristicattractions,andlearningma- terials(Lu,Wu,Mao,Wang,&Zhang,2015).
Kotkov,Wang,andVeijalainen(2016)andRicci,Rokach,Shapira, andKantor(2010)consideranycontentthatmayberecommended totheuserasan“item”;thistermcanrefertoasong,amovie,a
∗ Corresponding author.
E-mail addresses: [email protected] (R.M. Bertani), [email protected] (R.
A. C. Bianchi), [email protected] (A.H.R. Costa).
book,a service andevenfriends ina social network.This article willalsousethetermitem asareferencetoanytype ofcontent recommended by RS. Also, these items may have attributesthat characterizethem.Forexample,abookcanbecharacterizedbyits genre,topicorauthor.
Asstatedby,aRScanbeseenasaparticularcaseofinforma- tionretrieval,whereby theobjectiveis toinferthe levelofrele- vanceofasetofunknownitemstoatargetuserandtogeneratea listofrecommendationscomposedofitemsorderedbyrelevance.
Theprocessofgeneratingarecommendationcanbecharacter- izedasa combinationofthe followingfeatures:the type ofdata availableforanalysinguserpreference;thefilteringalgorithmcon- sidered; the approach used (whether ornot based on the direct useof the data);the recommended techniqueused (e.g.,nearest neighbouralgorithms,fuzzymodels,singularvaluedecomposition, bio-inspiredalgorithms,amongothers);andthedispersionlevelof thedata-set(Bobadilla,Ortega,Hernando,&Gutiérrez,2013).
Whenapplyingtheanalysisofuserpreference,itisfundamen- tal to know the level of relevance that a set of items has for a givenuser. Thisinformationcan be obtainedexplicitly througha ratingvalue, asinJavariandJalili(2014),Liu,Hu,Mian,Tian,and Zhu (2014) and Zhang et al. (2017), or it can be implicitly ob- tained when the RS infers the level of relevance of the item to a user through behavioural analysis, such as counting the num- ber of times the user clicks on-screen elements, or monitoring userdownloads orsearches,as inSánchez-Moreno,GilGonzález, https://doi.org/10.1016/j.eswa.2019.113149
0957-4174/© 2020 Elsevier Ltd. All rights reserved.
Muñoz Vicente, López Batista, and Moreno García (2016) and Lacerda(2017).
Recommender systems can be broadly categorized, mainly, as content-based (CB) or collaborative filtering (CF) (Beel, Gipp, Langer,&Breitinger,2016;Katarya&Verma,2016;Luetal.,2015), whichwillbediscussedinmoredetailinthefollowingsection,es- peciallytheCFapproach,duetoitshugepopularityandrelevance.
The remainder of this paper is structured as follows. In Section 2 we review some relevant related work, especially the details regarding CF approaches. Section 3 describes in detail the diffusion-based hybrid algorithm, which is the basis of our proposal. The proposed method is described in Section 4 and its experimental evaluation is presented in Section 5. Finally, Section6givesourconclusionsandindicationsoffuturework.
2. Relatedwork
BothCBandCFseektoidentifythemostinterestingitemsac- cordingtouserpreferences;however,eachapproachperformsthis sameprocessaccordingtoaparticularcriterion.
The underlyingidea ofCBis thata userisinterestedinitems that are similar to those she or he previously liked Zhang and Zeng(2015), Wangetal.(2017) andDeng, Zhong, Lü, Xiong,and Yeung (2017). CB approaches utilize a series of discrete features ofan item torecommend additionalitems withsimilar features.
CBapproachesworkwithdatathe userprovides, eitherexplicitly (rating)orimplicitly(byclickingonalink).Basedonthesedata,a userprofileisgenerated,whichisthen usedtomakesuggestions totheuser.As theuserprovidesmoreinputsortakesactions on theRS,theenginebecomesmoreandmoreaccurate.
In turn, the idea of CF is that a user likes items that other userslike.CFapproachesbuildingamodelfromauser’spastbe- haviour(itemspreviously purchasedorselectedand/or numerical ratingsgiventothoseitems)aswellassimilardecisionsmadeby other users.This modelis then used topredict items (or ratings foritems)thattheusermayhaveaninterestin.
Algorithm 1 describes a basic CF algorithm, detailed
Algorithm 1 – The collaborative filtering (CF) Algorithm (Riccietal.,2010),(Patraetal.,2015),(Yangetal.,2016).
Require: useru,dataset,
|
L|
,|
Vz|
.1: ratingsMatrix←Extracttheratingmatrixfromusersanditems ofthedataset
2: foreachuser
v
∈dataset,v
=u do3: Iuv← Separate the items rated by both u and
v
in the ratingsMatrix4: CalculatePCC(u,
v
)usingIuvandratingsMatrix(Eq.1) 5: endfor6: Listusers←SortusersdecreasinglyaccordingtoPCC 7: Vz←Separatethefirst
|
Vz|
fromListusers8: Iu←Obtaintheitemsuinteractedwith 9: foreachitemi∈/Iu do
10: Calculate pred(u,i)usingVzastheuserneighbourhood(Eq.
2) 11: endfor
12: list←Sortallitemsby pred(u,i)
13: L←Separatethefirst
|
L|
fromlist14: return L
in Ricci et al. (2010), Patra, Launonen, Ollikainen, and Nandi (2015) and Yang, Wu, Zheng, Wang, and Lei (2016) and uses the Pearson Correlation Coefficient (PCC) (Pearson, 1920) to computethesimilarity betweena coupleofusers.PCCisused in
Step4ofAlgorithm1andcanbecalculatedby:
PCC
(
u,v )
=i∈Iuv
(
ru,i−r¯u)(
rv,i−r¯v)
i∈Iuv
(
ru,i−r¯u)
2i∈Iuv(
rv,i−r¯v)
2, (1)where Iuv is the set ofitemsrated by both u andv;ru,i and rv,i are the rating values attributed to item i by users u and v, re- spectively;r¯u andr¯varetheaverageratingsattributedtoallitems whichuandvhaveinteractedwith, respectively;withPCC(u,
v
)∈ R,−1.0≤PCC(u,v
)≤1.0.Instep10,Algorithm1iscalculated(Patraetal.,2015):
pred
(
u,i)
=r¯u+b∈VzPCC
(
u,b)
∗(
rb,i−r¯b)
b∈VzPCC
(
u,b)
, (2)wherer¯uistheratingaverageattributedtoitemswhichuseruhas interactedwith;Vzisthesetofuserssimilartou(neighborhood);
PCC(u,b) is the similarity betweenu andb; rb,i is the rating at- tributedtoitemibyuserb;andr¯bistheaverageratingattributed toall itemswithwhichuserb hashadsome interaction.Accord- ingtoLu,Shambour,Xu,Lin,andZhang(2013)andKaminskasand Bridge (2016), values of |Vz| between 20 and 40 are considered thosethatmaximizetherecommendationresults.
CFalgorithms standout asthemostwidelyused(Fu,Qu,Mo- ges,&Lu,2018;Wangetal.,2017;Yangetal.,2016).However,they stillsufferfromthesparsitydataproblem,characterizedbythesit- uationinwhichtherearealargenumberofusersanditemsinthe system,butasmallsetofratingsassignedtoitemsbyusers.Thus, theuser-itemratingmatrixisverysparse(Maetal.,2015).
ThebasicideasofCBandCF approachesarerathervagueand leave room for different approaches (Beel et al., 2016; Betru &
Onana, 2017;Dengetal., 2017; Katarya& Verma,2016;Lu etal., 2015;Wangetal.,2017;Zeng,Zeng,Shang,&Zhang,2013;Zhang etal.,2017;Zhouetal.,2010).
Notably,proposalsbasedonthephysicalprocessesofmassdif- fusion showedsignificant resultstomitigate thisparticularprob- lem(Dengetal.,2017;Wangetal.,2017;Zengetal.,2013;Zhang
& Zeng, 2015; Zhou et al., 2010). Particularly, the hybrid algo- rithm presentedin Zhouetal.(2010) served asthe basis forde- velopingseveralapproaches,suchasZengetal.(2013),Zhangand Zeng(2015)andDengetal.(2017).However,thisalgorithmandits variationsdo not differentiate the particularpreferences of users duringtherecommendationprocess.
We are hereparticularlyinterested indetermining the degree ofnovelty orpopularitythat representsthepreferences ofa par- ticulartargetuser.Thispersonalisedpreferenceisexpressedinthe userprofilethatcommandsthedecisionsofthenewRSalgorithm we propose, the UPOD algorithm – “User Profile-Oriented Diffu- sion”. Thus, accordingto userpreference, a certain level ofcom- binationof noveltyandpopularityisdefined, directingthe selec- tion andordering recommended items. Ourproposal experimen- tally demonstratedthat usersatisfactionwashigher withthelist ofrecommendationsgeneratedbyUPOD thanwiththelistgener- atedwithcommonlyusedtechniques.
Itis importanttoemphasizethat withouttheinclusion ofthe userprofile duringthe recommendationprocess,the systemcan- not be able to fully satisfy users regardingtheir preferencesand cansimplyrecommendthemostpredictablecontent.
3. Diffusion-basedrecommendationalgorithms
This section presents the hybrid algorithm proposed by Zhou et al. (2010), which is based on the physical mass dif- fusion process and gives a solution to the apparent diversity- accuracy dilemma of the recommender systems. The concern in this dilemma is to find an appropriate combination of methods basedon accuracy(providing popularity)anddiversity(providing
novelty). Theiralgorithm,calledMDHS,combinestheMass Diffu- sion algorithm (MD) (Zhou, Ren, Medo, & Zhang, 2007) and the Heat Spreading algorithm (HS) (Zhang, Blattner, & Yu, 2007) for generatingalistofrecommendations.
MDHS representsan RSsystemasa user-item bipartite graph, formally defined as G=
{
U,I,E}
, where U={
u1,u2,...,uN}
, I={
i1,i2,...,iM}
andE={
e1,e2,...,eK}
aretheuserset,theitemsetandthesetofgraphedges,respectively.Fromapreviouslyknown trainingdata-set,graphGisconstructed,assigningeach usertoa vertexuN ∈U,eachitemtoavertexiM∈I,andifauseruN inter- actedwithanyitemiM,anedgeeK∈EisinsertedintoG,making uN adjacenttoiM.Interactionheremeansthattheuserpurchased orratedagivenitem,forexample.
Foratargetuseruthatispresentinthedatabase,thefollowing threestepsareperformed:
Step1.Aresourcevalue r(u,i) isassignedtoeachitemi inG, accordingtothefollowingrule:ifthereisanedgebetween i andthetarget user,then r(u,i)=1, otherwiser(u,i)=0. InthisStepuisthetargetuser.
Step2.Theresourcevaluesareredistributed,throughapropa- gationprocess,fromitem-sidetotheuserside inG,where eachuserv∈Uatadestinationvertexreceivesanewrecal- culatedresourcevaluer(v,i)fromalltheresourcevaluesof adjacentitemsingraphG.
Step3.Theresourcevaluesassignedtotheusersarenowredis- tributedfromtheusersidetotheitemsideinG.Eachitem ireceivesanewresourcevaluer(v,i)recalculatedfromall theresourcevaluesofadjacentusers,computedinStep2.
ThecalculationofthenewresourcevaluesdescribedinSteps2 and3,andrepresentedrespectivelybyr(v,i)andr(v,i),depends on the algorithm considered, MD, HS or MDHS, andthe number of interactions between users anditems inthe RS database. The numberofinteractionsrepresentsthedegreeofvertices,w(u)and w(i),beingrespectivelythedegreeofthevertexrepresentinguser uandthedegreeofthevertexrepresentingitemi.
Thus, when considering only the MD algorithm, the resource valuesforsteps2and3aregiven,respectively,by:
rMD
( v
,i)
=i∈I
r
(
u,i)
w
(
i)
,v
∈U,i∈I, (3)and
rMD
( v
,i)
= v∈UrMD
( v
,i)
w
( v )
,v
∈U,i∈I. (4)Similarly,whenonlytheHSalgorithmisused,theresourceval- uesforsteps2and3aregiven,respectively,by:
rHS
( v
,i)
=i∈Ir
(
u,i)
w
( v )
,v
∈U,i∈I, (5)and rHS
( v
,i)
=
v∈UrHS
( v
,i)
w
(
i)
,v
∈U,i∈I. (6)Figs. 1and2illustratethe applicationofthe propagationpro- cess(steps1,2and3)inagraphconsideringtheMDandHSalgo- rithms,respectively.
InFig.1,thetargetuserisu4,whichpreviouslyinteractedwith itemsi3andi5.Followingthe MDalgorithm,resource valuesare propagated to the user side using Eq.(3); in this case, i3 prop- agates 1/2 to adjacentusers in the graphand i5 propagates1/3, resulting in 5/6 at u4. In the last step,the resources are propa- gated totheside oftheitemsaccordingtoEq.(4); thenewvalue initemi3,forexample,is5/12propagatedfromu4addedwith1/6 propagatedfromu1,resultingin7/12.
Fig.2illustratesthepropagationoftheHSalgorithm,following Eqs. (5) and(6). In the figure, the target user is u4, which pre- viously interactedwithitemsi3 andi5. Resource valuesare then propagatedfromthe itemto theuserside, resultingin1/3 inu1 (sumoftheresources received fromi1,i2,andi3,dividedby the degreeofu1),and1/2 inu2(sumoftheresources receivedfrom i2 and i5, divided by the degree of u2). In the last step, the re- sourcesarepropagatedfromtheuserstotheitemssideaccording toEq. (6); the newvalue initem i3,forexample,is 2/3 (sumof 1/3propagatedfromu1addedwith1propagatedfromu4,divided bydegreeofi3,whichis2).
Itis worth observingthat, accordingto Figs. 1and2,the MD algorithmtendstogeneraterecommendationscomposedofpopu- laritems(vertexwiththehigherdegreesinthegraph),whilethe HSalgorithmtendstogeneraterecommendationscomposedofthe lesser-knownitems(vertexwithlowerdegreesinthegraph).
TheMDHS algorithm allowscombiningMD andHS byusinga tuningparameter
λ
usedtocombineEqs.(3)and(5)intorHB (v
,i) (Step2)andEqs.(4)and(6)inrHB(v
,i)(Step3):rHB
( v
,i)
=i∈I
r
(
u,i)
w(
i)
λw
( v )
(1−λ),v
∈U,i∈I, (7) andrHB
( v
,i)
=v∈U
rHB
( v
,i)
w( v )
λw
(
i)
(1−λ),v
∈U,i∈I. (8)When
λ
=0theHSalgorithmisused,andwhenλ
=1theMD algorithmisused.Any valuebetween0and1meansthatacom- binationofbothmethodsisused.InZhouetal.(2010),thevalues ofλ
aredefinedas0≤λ
≤1,andtheyrecommendusingλ
=0.5. After theexecution ofthethree steps describedbefore,anyitem that,atthebeginningoftheprocess,didnot haveanyinteraction withtarget useruandthathasreceivedapositiveresourcevalue (r(u,i)>0),issortedbyitsfinalresourcevaluer(u,i)inascend- ingorderandincludedinthepredictionlist.Theitemswiththehighestresourcevaluesinthepredictionlist willcomposetherecommendationlistfortargetuseru.Thelistof recommendations consistsof |L| itemsand is sorted in descend- ing orderby theitem resource values;the first positioncontains themostrelevantitemandthelastonecontainstheleastrelevant item.
SomeextensionsofMDHS havebeenproposed intheRSliter- ature:theSemi-LocalDiffusionalgorithm(SLD)(Zengetal.,2013) allowsthepropagationprocesstoberepeatedinG.Inpractice,this meansthatsteps2and3occurmorethanonce,withrecommen- dationsreachingfartheruser-itemneighbourhoods.
However, both MDHS and SLDalgorithms always consider the same
λ
valueforallRSusers,withoutconsideringanyinformation abouttheir profiles. Thismeans that foreach value ofλ
,recom-mendationlists are generatedwiththe samedegree ofcombina- tionofMDandHSforallusers.
We believe that using the same value of
λ
for all RS targetusers,withoutdifferentiatingthemaccordingtotheirspecificpro- files, leads to worse results. We here propose a framework that tunesthe
λ
parameteraccordingtotheuserprofile,andwecom-binethis
λ
parameterwiththe MDHS algorithmto generateper-sonalisedrecommendations.Ourproposalisdescribedinthenext section.
4. Proposal
We proposethe UPOD algorithm – “UserProfileOriented Dif- fusion”. In the MDHS algorithm, UPOD uses a
λ
value especiallylearnedforthetargetuserinordertogeneratecustomizedrecom- mendationsforthetargetuser.
Fig. 1. The three propagation steps with the MD algorithm in a user-item bipartite graph. Users are represented by circles, items are represented by bold circles, the target user is indicated by a shaded circle and the most relevant item to be recommended is marked with a dotted rectangle (modified figure from Zhou et al. (2010) ).
Fig. 2. The three propagation steps with the HS algorithm in a user-item bipartite graph. Users are represented by circles, items are represented by bold circles, the target user is indicated by a shaded circle and the most relevant item to be recommended is marked with a dotted rectangle (modified figure from Zhou et al. (2010) ).
Fig. 3. The two phases of the UPOD algorithm.
UPOD operates in two phases: a trainingphase and a recom- mendationphase,asshowninFig.3.
The training phase is responsible for preprocessing the data, definingthefeaturesoftheusers,constructingthebipartitegraph ofinteractions,clusteringusersaccordingtofeatures,determining theprofileofeach clusterfromthe
λ
that characterizesthe clus- ter and, finally,using the pairthat defines theuser features and thebestλ
for each cluster. The trainingphase trains a classifierthatwillprovidethebest
λ
fromthefeaturesofatargetuser.ThisprocedureisillustratedinAlgorithm2.
Thetrainingphaseisinitiallyresponsibleforpreprocessingthe trainingdataset (Step1), by deletingusers withinvalid orempty data, removing the timestamp, transforming the zip code into a countryandstateformat,etc.Alsointhisstep,attributevaluesare alltransformed intocategoricalvalues.Forexample,the userage attributeisdiscretized every5years andeachinterval represents onecategory.
From Step 2 to 11, the preprocessed data are grouped into k clusters by performing the k-modes algorithm (Huang, 1998).
The k-modes algorithm uses a dissimilarity measure for categor- ical data, represents cluster centroids with modes, and uses a frequency-based method to update modes in the clusteringpro- cess. Thisalgorithmwaschosenduetoits simplicity,easyunder- standingand implementation,and its efficiency, which isO(tkn), where n isthe number ofdata, kis the number ofclusters and t isthe numberofiterations. Since kandt are small,k-modesis considered a linear algorithm and it is a variant of k-means for categoricaldata.
LetXandYbeattributevectorsthatrepresentcategoricaldata, wherexj∈Xandyj∈Yarethevaluesofeachdataattribute.The measureofdissimilaritybetweenXandYink-modesisgivenby:
d
(
X,Y)
= mj=1
δ (
xj,yj)
, (9)where
δ
(xj,yj)isequalto0ifxj=yjandisequalto1ifxj=yj. Consider anexample inwhichvectorshave3attributes:a1 = gender, a2 = profession, anda3 = age group. An X vector could bex1=male,x2=teacherandx3=15–20andY couldbey1= female,y2=teacherandy3=20–25.Inthiscase,themeasureof dissimilarityd(X,Y)=1+0+1,indicatingthatofthe3attributes, only1isidenticalinbothvectors.A mode of a set of n categorical objects Xi is a vector Q (a categoricalattribute vector) that has theleast dissimilarity to all vectors(elements)ina givenclusterandthereforeminimizesthe
Algorithm2 –UPODtrainingphase.
Require: dataset,Feat,kmin,kmax,,Metric
1: TrainData←prep(dataset)
2: E
v
al←03: fork=kmintok=kmaxdo
4: Clustersk←clustering(TrainData,Feat,k)
5: E
v
alk←ev
aluate(Clustersk)6: ifE
v
alkisbetterthanEv
althen 7: Ev
al←Ev
alk8: BestClusters←Clustersk 9: Bestk←k
10: endif 11: endfor
12: G←buildGraph(TrainData)
13: TrainSet←
{}
14: fori=1toi=Bestkdo
15:
λ
best[i]←BestLambda(G,BestClusters[i],,Metric)16: foreachuserinBestClusters[i]do 17: userFeat←ExtractFeat(user,Feat)
18: TrainSet←TrainSet ∪
{
<userFeat,λ
best[i]>}
19: endfor 20: endfor
21: classi fier←TrainClassi fier(TrainSet)
22: return classi fier,G
functionD(X,Q):
D
(
X,Q)
= ni=1
d
(
Xi,Q)
. (10)Thus, in Step 4 of Algorithm 2, the clustering function repre- sents the k-modes algorithm, used with k varying from kmin to kmaxand,foreachk,anevaluationofthekclustersisperformed.
Thek-modesalgorithmisshowninAlgorithm3(Huang,1998).
Algorithm3 – Thek-modesAlgorithm.
Require: TrainData,Feat,k
1: Selectkinitialmodes,oneforeachcluster 2: foreachobjectofrom<TrainData,Feat>do
3: Allocateototheclusterwhosemodeisthenearesttoitac- cordingtoEq.9
4: endfor
5: foreachmodefromeachclusterdo
6: Retestthedissimilarityofobjectsagainstthecurrentmodes 7: ifitsnearestmodebelongstoanotherclusterthen
8: Reallocatetheobjecttothatcluster 9: Updatethemodesofbothclusters 10: endif
11: endfor
12: Clustersk←setofkfinalclusters 13: return Clustersk
In Step 5 of Algorithm 2, for each k cluster returned by the clustering function that implements the k-modes algorithm (Algorithm 3), we evaluate the effectiveness of the data parti- tioning achieved. For this, we use the entropy-based clustering criterion given by Chen and Liu (2009). Consider a dataset X=
{
x1,x2,...,xn}
withninstancesandccolumns.Eachinstancexiis describedbyccolumns(xi={
xi1,xi2,...,xic}
)andeachcolumnofxihasavalue fromdomainAi,inwhichthereareafinitenumberof uniquecategoricalvalues.Ifv∈Ai,thentheprobabilityofxi=
v
is givenby p(xi=v
)andthecolumnentropyofAiisgivenby:H
(
Ai|
X)
=− v∈Aip
( v |
X)
log2p( v |
X)
, (11)wherep(v
|
X) isan empiricalprobability estimatedinX.Theesti- matedentropyofthewholedatasetisthesumofallthecolumns entropyandisrepresentedbyH(X).Besides,supposingdatasetXis partitionedintokclusters,beingCk={
C1,C2,...Ck}
thesetofclus-ters(partition), Ck isa cluster, nk is thenumber ofobjects in Ck andH(Ck)istheclusterentropy.AccordingtoChenandLiu(2009), theentropy-basedclusteringcriterionisgivenby
Opt
(
Ck)
=1 cH
(
X)
−1 nk
k=1
nkH
(
Ck)
. (12)Notice that H(X) isfixed anddependson the dataset, sothat the maximization of Opt(Ck) is equivalentto minimizing the ex- pression 1nk
k=1nkH(Ck),which is named expectedentropy from partitionCk.ChenandLiu(2009)arguethatthe“bestk” istheone thatresultsintheminimumexpectedentropyamongallkclusters.
Hence,attheendofthisloop(Steps3– 11),wehavethebestset ofclustersforthebestkinBestClusters,giveninBestk.
Nowthat wehavethebestset ofdataset clusters,weneed to determineforeach clusterthebest
λ
,λ
best,which representsthe profileoftheusersincludedinthatcluster.ThisisdoneinStep15 ofAlgorithm 2,withalgorithm BestLambda giveninAlgorithm 4.Algorithm4 – TheBestLambdaAlgorithm.
Require: G,BestCluster[i],Metric,
1: E
v
alλ←02: foreach
λ
valueindo3: RecListλ←MDHS(G,BestCluster[i],
λ
)4: E
v
al←Metric(RecListλ)5: ifE
v
alisbetterthanEv
alλthen 6: Ev
alλ←Ev
al7:
λ
best[i]←λ
8: endif 9: endfor 10: return
λ
best[i]Inthisalgorithm,eachrecommendationlistgeneratedwitheach
λ
isevaluated(Step 4)accordingtometrics, suchasRankingScore, PrecisionandRecall (Ma,Ren, Wu,Wang,&Feng,2017;Zeng,An, Liu,Shang, & Zhou, 2014), to determine the
λ
that characterizes thebest-generatedrecommendationlist;thisλ
willbeλ
best.Note that in our system,λ
best represents the appropriate combination of novelty andpopularity for recommendations accordingto the userinquestionandsetcontainsallthepossiblelambdavalues consideredbytheauthorsofMDHSinthebasearticle,being:={
0.0,0.1,0.2,0.3,0.4,0,5,0.6,0.7,0.8,0.9,1.0}
.Once the
λ
best[i] for that cluster i in the set BestClusters has been determined, we collect the features of each user included in that cluster to compose a training pair for the classifier (see Algorithm2).Thispair, <userFeat,λ
best[i]>,isinsertedintothe classifiertrainingdataset,TrainSet(Steps17and18).Finally,using thetrainingdataTrainSet,aclassifieristrainedinStep21sothat, givenfeaturessuchasage,gender,andoccupationofagiventarget user,theclassifiercanindicatewhichthebestλ
isthatrepresentsthat userprofile. The output ofthe training phase isthe trained classifierandgraph Gofinteractions betweenusersanditemsin theoriginaldataset.Fortheresultspresentedhere,weuseanSVM (SupportVectorMachine)classifier(Wittenetal.,1999).However, anyotherkindofclassifiercouldbeused.
Theclassifierisusedtopredictthebestvalueof
λ
foratargetuserintherecommendationphase.Therecommendationphasere- quiresasinput theinteraction graphG,theset ofuserattributes Feat, the target user, the classifier trained in training phase and asize fortherecommendationlistL.TheUPOD Recommendation phaseisillustratedinAlgorithm5.
Algorithm5 –UPODRecommendationphase.
Require: G,Feat,user,classi fier,
|
L|
.1: userFeatures←ExtractFeat(user,Feat)
2:
λ
user←classi fier(userFeatures)3: recommendList←MDHS(G,user,
λ
user,|
L|
)4: return recommendList
Table 1
Sparsity level of the used dataset.
Dataset Sparsity
MovieLens 0.9371
Last.FM 0.9989
Book-Crossing 0.9996
First,thetargetuser’sfeaturevalues,userFeatures,areextracted inStep1 consideringthefeatures definedinthe setFeatandthe valuesindicatedbythetargetuser.ValuesrepresentedinuserFea- turesareinputstothetrainedclassifier,whichpredictstheproper
λ
value fortheuser,represented byλ
user atStep2.This variableλ
userreflectsthetargetuserprofile,definingtheappropriatecom- binationof the MDalgorithm, which selects popular items, with theHSalgorithm,whichselectsitemsbasedonnovelties.Then,in Step3theMDHS algorithmisexecuted ongraphGforthetarget userandhis/herpersonalisedtuningparameter(λ
user) andgener-atesthe recommendationlist ofsize |L|. Notethat now the gen- erated recommendation list combines popular and lesser-known itemsaccordingtotheparticularuserprofile.
5. Experimentsandresults
Weperformedaseriesofexperimentstovalidatetheproposed algorithm.Forthis,weperformedacomparativeanalysiswithtwo recommendersystems andthree differentdatabases.Weevaluate theresultsaccordingtosomemetricsthat arewidelyusedinthis typeofapplication.Thedetailsoftheexperimentalproceduresand theresultsaredescribedbelow.
5.1.Databases
We considered threewell-known datasets intheRSliterature.
Thedatasetsusedare:
TheMovieLens dataset(Harper&Konstan, 2015) contains910 users,1672itemsand95,579interactions.Theuserfeatures areage,genderandlocation.
The Last.FM dataset (Celma, 2010). This work used a ran- domsubsetofthetotaldataset,containing2846users,4995 items and 14,583 interactions. The user features are age, genderandcountry.
TheBook-Crossingdataset(Ziegler,McNee,Konstan,&Lausen, 2005)isalsoarandomsubsetofthetotaldataset,containing 3421 users, 26,811 itemsand35,572 interactions. The user featuresareageandlocation.
AccordingtoLu etal.(2013)andShambourandLu(2015),the sparsenessofagivendatasetcanbecalculatedby
sparsity=1− K
NM, (13)
whereK,NandMare,respectively,thenumberofinteractionsbe- tween users anditems, thenumber of users andthe number of items.
Using Eq.(13),we havecomputedthesparsitylevel ofMovie- Lens,Last.FM and Book-Crossing datasets shown in Table 1. The Book-Crossingdatasetisthesparsest.
5.2. Algorithmsforcomparison
Wehavecomparativelyevaluatedtheproposedalgorithmwith twootheralgorithms:(i)theMDHSalgorithmwith
λ
=0.5,follow- ingthe author’ssuggestion,and(ii)a nearestneighbourhood col- laborative filtering (CF) algorithm, whichwas previously detailed inAlgorithm1inSection1.5.3. Evaluationmethodandmetrics
Weused the10-foldcross-validationtechniquetovalidate our proposal.Cross-validation is a modelvalidation techniqueforas- sessinghow the resultsof astatistical analysiswill generalize to anindependentdataset.Thisapproachinvolvesrandomlydividing thesetofdataintokgroups,orfolds,ofapproximatelyequalsize.
The firstfold istreatedasatestingset,andthemethod isfiton the remaining k−1 folds(called thetrainingset). The procedure isrepeatedforeachgroup,andthevalidationresultsareaveraged overtheroundstogiveanestimateofthemodelpredictiveperfor- mance.Weonlyconsiderusersthatarepresentinboththetrain- ingsetandthetestingset.Theseusersarecalled“validusers” and arerepresentedbyUvalid.
Weusedthreemetricstovalidatetherecommendationlistgen- erated by the systems: Recall, Precision, and Ranking Score (Ma etal.,2017;Zengetal.,2014).
Recall: It measures the proportionof itemsin the testingset thatcorrespondtoitemsintherecommendationlistgener- atedforatargetuseru(Yangetal.,2016):
Reu
(
L)
=du(
L)
Iu , (14)
where du(L) is the number of items related to target user u∈Uvalidinthetestingsetandalsopresentinrecommenda- tionlistL;Iuisthetotalnumberofitemsrelatedtou∈Uvalid inthetestingset.
Thehigheritsvalue,themoreitemsinthetestingsetcorre- spondtorecommendeditems.
Precision:Itmeasures theproportionofitemsintheuserrec- ommendation listthat corresponds to itemsrelated to the target user inthe testing set,and is givenby (Yang et al., 2016):
Peu
(
L)
= du(
L)
|
L|
, (15)wheredu(L)isthenumberofitemsrelatedtothetargetuser u∈Uvalid in thetestingset that are presentinthe recom- mendationlistand|L|isthesizeoftherecommendationlist.
Thehigheritsvalue,themoreitemsintherecommendation listcorrespondtoitemsrelatedtotheuserinthetestingset.
Ranking Score: It takes into account the position in which itemsrelatedtothetarget userinthe testingsetappearin thegenerated recommendationlist.This positionis known asthe “rank” of theitem, which is obtainedbased on the positionoftheitem intherecommendationlistdivided by the number of items initially unknown to the user (Chen, Zeng,&Chen,2015;Yu,Zeng,Gillard,&Medo,2015):
RSu=
i∈Iu
rank
(
i)
|
Iu|
, (16)whereIuisthesetofitemsrelatedtou∈Uvalidinthetesting setandrank(i)representstherankofitemiinthegenerated recommendationlist.Thismeansthattheclosertothefirst positionstheitemis,thelowertherankvalue.
Table 2
Experimental results for the three datasets.
Algorithm RS Re(L) Pe(L)
avg σ avg σ avg σ
CF 0.497 0.002 0.014 0.002 0.003 4 . 6 x 10 −4
MDHS ( λ= 0.5) 0.080 0.001 0.299 0.008 0.086 0.001
UPOD 0.074 0.001 0.308 0.007 0.091 0.002
(a) Results from the Movielens dataset. With k avg= 197 and k σ= 4 . 25 for the UPOD algorithm. The average processing time for CF, MDHS and UPOD were 5 min, 30 s and 5 s, respectively.
Algorithm RS Re(L) Pe(L)
avg σ avg σ avg σ
CF 0.444 0.004 0.018 0.004 9 . 6 x 10 −4 2 x 10 −4
MDHS ( λ=0 . 5 ) 0.252 0.007 0.119 0.006 0.006 2 . 9 x 10 −4
UPOD 0.251 0.006 0.174 0.014 0.009 7.4x10 −4
(b) Results from the Last.FM dataset. With k avg = 196 and k σ= 2 . 94 for the UPOD algorithm. The average processing time for CF, MDHS and UPOD were 2 min, 18 s and 9 s, respectively
Algorithm RS Re(L) Pe(L)
avg σ avg σ avg σ
CF 0.406 0.007 3 . 6 x 10 −4 2 . 4 x 10 −4 3 . 5 x 10 −5 1 x 10 −5
MDHS ( λ=0 . 5 ) 0.395 0.007 0.002 4 . 7 x 10 −4 2 x 10 −4 4 . 9 x 10 −5
UPOD 0.394 0.008 0.003 0.001 3.76x10 −4 7.3x10 −5
(c) Results from the Book-Crossing dataset. With k avg= 192 and k σ= 8 . 5 for the UPOD algorithm. The average processing time for CF, MDHS and UPOD were 20 min, 30 s and 5 s, respectively.
5.4. Implementationdetails
An importantaspectto consideris howthe proposednew al- gorithm, UPOD, can be compared withthe other two algorithms in termsof time efficiency. Toexplain how we can measure this efficiency,itisfirstnecessarytounderstandsomeimplementation detailsinvolvedinthestructure,aswellasthecomputationalcon- ditionsunderwhichthealgorithmswereexecuted.
OneofthemostimportantdifferencesbetweentheUPOD and MDHS implementations isthat the first one performs thepropa- gationsteps usingaparallelcomputingstrategy. Thismeansthat, inourimplementation,thepropagationstepsoccurformorethan one useratatime, dependingonthenumberofprocessor cores, unlike theoriginal MDHSimplementation,whichperformspropa- gationsforeachuser,oneaftertheother.
Allthealgorithmswere executedonaDebianGNULinuxwith 7.5 GB RAM and 2 vCPUs available on the Google Cloud Plat- form(GoogleComputeEngine) andall theimplementations were madewiththeJavaTMprogramminglanguage.Underthesecondi- tions,theUPOD trainingphasetime was3minfortheMovielens database, 41s forLast.FMand2min and55 sforBook-Crossing.
Ofcourse,thismayvarydependingonthecomputationalstructure usedandthesizeofthedataset,thenumberofuserattributes,and thenumberofgraphedges.
Notethatthetrainingphasetakesplaceofflineandonlywhen needed.
5.5. Resultsanddiscussion
Forall the experiments,we considered
|
L|
=30 asthe size of the recommendationlist.IntheCF algorithm,the neighbourhood sizewas|
Vz|
=30;andfortheUPODalgorithm,kmin wasdefined experimentallyas100,whilekmaxwasdefinedinordertomaintain amaximumof15–20usersineach cluster,sothatkmax=200for alldatasets.Table 2 shows the results from each dataset. Bold are those results inwhich UPODhas outperformedthe CFand MDHSalgo- rithms atthe sametime, accordingto the t-test (95%confidence interval),meaningthat thedifference betweentheresultsis con- sideredto be statisticallysignificant. kavg andkσ were calculated
foreachdataset,aftertheapplicationoftheentropy-basedcluster- ingcriteriontoeachofthe10executions.
Table 2 (a) presents the results for the Movielens dataset, Table 2(b) presents the results for the Last.FM dataset, and Table2(c)presentstheresultsfortheBook-Crossingdataset.
Table2allowsobservingthat UPODsignificantly outperformed the other algorithms concerning the metrics used, although the differencebetweentheresultswasnotconsideredstatisticallysig- nificant in all cases. For example, according to the t-test, UPOD does not outperform CF and MDHS in the RankScore (RS) met- ric inthe Last.FMand Book-Crossing datasets.However, our pro- posal systematically outperformed the others according to Recall andPrecisionmetrics.
Theinclusion oftheuserprofile isavery importantcontribu- tionofourproposalasitallowsformorerefinedandpersonalised recommendationcontent.
Inaddition,the userprofile canalso beobtained inwaysdif- ferentfromthatusedintheUPODsystem;Forexample,itcanbe derivedfromsocialnetworkdataorevenfromtheuserbehaviour observedine-commerce.
6. Conclusionsandfuturework
ThisarticlecontributedwiththeUPODalgorithm; thisallowed the automatic tuning of the
λ
parameter, which determines themixingdegreeinthemass diffusionprocess.UPOD makesit pos- sibleto generate personalisedrecommendations, combining pop- ularityandnoveltyaccordingtoeachuser’sparticularprofile.Ex- perimentsusingthreewell-knowndatabaseshaveshownthatsu- pervisedlearningofthe
λ
parameteraccordingtouserprofilecanimprovethequalityofrecommendationsinsparsedatasets.
Future research should include the rating values assigned to itemsinthe calculationofresourcevaluesduringthemassdiffu- sionprocessinthebipartitegraph.Thus,better-rateditemsinthe systemwould be recommended more stronglythan poorly rated items.Anotherpossibleimprovementwouldbetoallowagreater amountofresourcepropagationinthegraph,asintheSemi-Local Diffusion,Zengetal.(2013)algorithm.Finally,severalaspectsmust beintegratedintothedesignofarecommendersystemsothat it
cangenerate evenbetter recommendations that are morein line witheachuser’sprofile.
DeclarationofCompetingInterest None.
Creditauthorshipcontributionstatement
RicardoMitollo Bertani: Conceptualization, Formal analysis, Methodology, Validation, Writing - original draft. Reinaldo A. C.
Bianchi: Writing - original draft, Formal analysis, Conceptualiza- tion. AnnaHelenaReali Costa: Writing - original draft, Formal analysis,Conceptualization,Fundingacquisition.
Acknowledgement
The authors acknowledge the support of CNPq (N.
425860/2016-7 and N. 307027/2017-1). The authors also would like to thank all the support provided by An Zeng and Wei Zeng,authors ofthepaperZengetal.(2013),withthereproduc- tion of the MDHS algorithm, which was originally presented in Zhouet al. (2010). This studywas financed in partby the Coor- denação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES), Brazil,FinanceCode001.
References
Beel, J., Gipp, B., Langer, S., & Breitinger, C. (2016). Research-paper recommender systems: a literature survey. International Journal on Digital Libraries, 17 (4), 305–
338. doi: 10.10 07/s0 0799- 015- 0156- 0 .
Betru, B. T., & Onana, C. A. (2017). Deep learning methods on recommender sys- tem: A survey of state-of-the-art. International Journal of Computer Applications, 162 (10), 975–8887. doi: 10.5120/ijca2017913361 .
Bobadilla, J., Ortega, F., Hernando, A., & Gutiérrez, A. (2013). Recommender systems survey. Knowledge-Based Systems, 46 , 109–132. doi: 10.1016/j.knosys.2013.03.012 . Celma, O. (2010). Music recommendation and discovery . ISBN 978-3-642-13287-2:
Springer .
Chen, B., Zeng, A., & Chen, L. (2015). The effect of heterogeneous dynamics of online users on information filtering. Physics Letters A , 379 (43–4 4), 2839–284 4. doi: 10.
1016/j.physleta.2015.09.019 .
Chen, K., & Liu, L. (2009). “Best K”: Critical clustering structures in categor- ical datasets. Knowledge and Information Systems, 20 (1), 1–33. doi: 10.1007/
s10115- 008- 0159- x .
Deng, X., Zhong, Y., Lü, L., Xiong, N., & Yeung, C. (2017). A general and effective diffusion-based recommendation scheme on coupled social networks. Informa- tion Sciences, 417 (168), 420–434. doi: 10.1016/j.ins.2017.07.021 .
Fu, M., Qu, H., Moges, D., & Lu, L. (2018). Attention based collaborative filtering.
Neurocomputing, 311 , 88–98. doi: 10.1016/j.neucom.2018.05.049 .
Harper, F. M., & Konstan, J. A. (2015). The movielens datasets: History and context.
ACM Transactions on Interactive Intelligent Systems , 5 (4), 19:1–19:19. doi: 10.
1145/2827872 .
Huang, Z. (1998). Extensions to the k-Means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery , 2 (3), 283–304 . Javari, A., & Jalili, M. (2014). A probabilistic model to resolve diversity-accuracy chal-
lenge of recommendation systems. Knowledge and Information Systems, 44 (3), 609–627. doi: 10.1007/s10115- 014- 0779- 2 .
Kaminskas, M., & Bridge, D. (2016). Diversity, serendipity, novelty, and coverage.
ACM Transactions on Interactive Intelligent Systems , 7 (1), 1–42. doi: 10.1145/
2926720 .
Katarya, R., & Verma, O. P. (2016). Recent developments in affective recommender systems. Physica A: Statistical Mechanics and its Applications, 461 , 182–190.
doi: 10.1016/j.physa.2016.05.046 .
Kotkov, D., Wang, S., & Veijalainen, J. (2016). A survey of serendipity in recom- mender systems. Knowledge-Based Systems, 111 , 180–192. doi: 10.1016/j.knosys.
2016.08.014 .
Lacerda, A. (2017). Multi-Objective ranked bandits for recommender systems. Neu- rocomputing , 246 , 12–24. doi: 10.1016/j.neucom.2016.12.076 .
Liu, H., Hu, Z., Mian, A., Tian, H., & Zhu, X. (2014). A new user similarity model to improve the accuracy of collaborative filtering. Knowledge-Based Systems, 56 , 156–166. doi: 10.1016/j.knosys.2013.11.006 .
Lu, J., Shambour, Q., Xu, Y., Lin, Q., & Zhang, G. (2013). A web-based personal- ized business partner recommendation system using fuzzy semantic techniques.
Computational Intelligence, 29 (1), 37–69. doi: 10.1111/j.1467-8640.2012.00427.x . Lu, J., Wu, D., Mao, M., Wang, W., & Zhang, G. (2015). Recommender system applica-
tion developments: A survey. Decision Support Systems, 74 , 12–32. doi: 10.1016/j.
dss.2015.03.008 .
Ma, T. , Zhou, J. , Tang, M. , Tian, Y. , Al-Dhelaan, A. , Al-Rodhaan, M. , & Lee, S. (2015).
Social network and tag sources based augmenting collaborative recommender system. IEICE Transactions on Information and Systems , 98 (4), 902–910 . Ma, W., Ren, C., Wu, Y., Wang, S., & Feng, X. (2017). Personalized recommendation
via unbalance full-connectivity inference. Physica A: Statistical Mechanics and its Applications, 483 , 273–279. doi: 10.1016/j.physa.2017.04.041 .
Patra, B. K., Launonen, R., Ollikainen, V., & Nandi, S. (2015). A new similarity mea- sure using Bhattacharyya coefficient for collaborative filtering in sparse data.
Knowledge-Based Systems, 82 , 163–177. doi: 10.1016/j.knosys.2015.03.001 . Pearson, K. (1920). Notes on the history of correlation. Biometrika , 13 (1), 25–45.
doi: 10.1093/biomet/13.1.25 .
Ricci, F. , Rokach, L. , Shapira, B. , & Kantor, P. B. (2010). Recommender systems hand- book (1st). New York, NY, USA: Springer-Verlag New York, Inc. .
Shambour, Q., & Lu, J. (2015). An effective recommender system by unifying user and item trust information for B2B applications. Journal of Computer and System Sciences, 81 (7), 1110–1126. doi: 10.1016/j.jcss.2014.12.029 .
Sánchez-Moreno, D., Gil González, A. B., Muñoz Vicente, M. D., López Batista, V. F., &
Moreno García, M. N. (2016). A collaborative filtering method for music recom- mendation using playing coefficients for artists and users. Expert Systems with Applications, 66 , 1339–1351. doi: 10.1016/j.eswa.2016.09.019 .
Wang, X., Liu, Y., Zhang, G., Zhang, Y., Chen, H., & Lu, J. (2017). Mixed similarity dif- fusion for recommendation on bipartite networks. IEEE Access , 5 , 21029–21038.
doi: 10.1109/ACCESS.2017.2753818 .
Witten, I. H. , Frank, E. , Trigg, L. , Hall, M. , Holmes, G. , & Cunningham, S. J. (1999).
Weka: Practical machine learning tools and techniques with Java implementa- tions. In N. Kasabov, & K. Ko (Eds.), Proceedings of the ICONIP/ANZIIS/ANNES’99 workshop on emerging knowledge engineering and connectionist-based information systems (pp. 192–196) . Dunedin, New Zealand
Yang, Z., Wu, B., Zheng, K., Wang, X., & Lei, L. (2016). A survey of collaborative filtering-based recommender systems for mobile internet applications. IEEE Ac- cess , 4 , 3273–3287. doi: 10.1109/ACCESS.2016.2573314 .
Yu, F., Zeng, A., Gillard, S., & Medo, M. (2015). Network-based recommendation algo- rithms: A review. Physica A: Statistical Mechanics and its Applications, 452 , 192–
208. arXiv:1511.06252 . doi: 10.1016/j.physa.2016.02.021 .
Zeng, W., An, Z., Liu, H., Shang, M.-s., & Zhou, T. (2014). Uncovering the information core in recommender systems. Scientific Reports, 4 , 6140. doi: 10.1038/srep06140 . Zeng, W., Zeng, A., Shang, M. S., & Zhang, Y. C. (2013). Information filtering in sparse online systems: Recommendation via semi-local diffusion. PLoS ONE, 8 (11). doi: 10.1371/journal.pone.0079354 .
Zhang, F.-G., & Zeng, A. (2015). Information filtering via heterogeneous diffusion in online bipartite networks. PLoS ONE , 10 (6), e0129459. doi: 10.1371/journal.pone.
0129459 .
Zhang, S., Yao, L., & Sun, A. (2017). Deep learning based recommender system: A Survey and new perspectives. ACM Journal on Computing and Cultural Heritage (JOCCH) , 1 (1), 1–35. arXiv:1707.07435 .
Zhang, Y.-C., Blattner, M., & Yu, Y.-K. (2007). Publisher’S note: heat conduction pro- cess on community networks as a recommendation model. Physical Review Let- ters , 99 (16), 169902. arXiv:0803.2179 . doi: 10.1103/PhysRevLett.99.169902 . Zhou, T., Kuscsik, Z., Liu, J.-G., Medo, M., Wakeling, J. R., & Zhang, Y.-C. (2010). Solv-
ing the apparent diversity-accuracy dilemma of recommender systems.. Proceed- ings of the National Academy of Sciences of the United States of America, 107 (10), 4511–4515. arXiv:0808.2670 . doi: 10.1073/pnas.10 0 0488107 .
Zhou, T., Ren, J., Medo, M., & Zhang, Y. C. (2007). Bipartite network projection and personal recommendation. Physical Review E - Statistical, Nonlinear, and Soft Mat- ter Physics, 76 (4), 1–7. doi: 10.1103/PhysRevE.76.046115 .
Ziegler, C.-N. C., McNee, S. M. S., Konstan, J. a. J., & Lausen, G. (2005). Improving rec- ommendation lists through topic diversification. In Proceedings of the 14th inter- national conference on World Wide Web WWW 05 (p. 22). doi: 10.1145/1060745.
1060754 .