networks model to hide user content interests in untrusted peer-to-peer Mistrustful P2P: Deterministic privacy-preserving P2P ﬁle sharing Computer Networks

(1)

ContentslistsavailableatScienceDirect

Computer Networks

journalhomepage:www.elsevier.com/locate/comnet

Mistrustful P2P: Deterministic privacy-preserving P2P ﬁle sharing model to hide user content interests in untrusted peer-to-peer networks

Pedro Moreira da Silva

^∗

, Jaime Dias, Manuel Ricardo

INESC TEC, Faculdade de Engenharia, Universidade do Porto Rua Dr. Roberto Frias, 378, Porto 4200-465, Portugal

a rt i c l e i n f o

Article history:

Received 3 October 2016 Revised 10 March 2017 Accepted 5 April 2017 Available online 6 April 2017 Keywords:

Peer-to-peer Deterministic privacy Content interests Plausible deniability Legal framework

a b s t r a c t

P2Pnetworksendowedindividualswiththemeanstoeasilyandefficientlydistributedigitalmediaover theInternet,butuserlegalliabilityissuesmayberaisedastheyalsofacilitatetheunauthorizeddistribu- tionandreproductionofcopyrightedmaterial.TraditionalP2Pfilesharingsystemsfocusonperformance andscalability,disregardinganyprivacyorlegalissuesthatmayarisefromtheiruse.Lackingalterna- tives,andunawareofthe privacyissuesthatarisefromrelaying trafficofinsecureapplications,users haveadoptedanonymitysystemsforP2Pfilesharing.

Thiswork aimsathidingusercontentinterestsfrommaliciouspeersthroughplausibledeniability.

TheMistrustfulP2PmodelisbuiltontheconceptofmistrustingalltheentitiesparticipatingintheP2P network,henceitsname. Itprovidesadeterministicand conﬁgurableprivacyprotectionthat relieson covercontentdownloadstohideusercontentinterests,hasnotrustrequirements,andintroducesseveral mechanismstopreventuserlegalliabilityandreducenetworkoverheadwhileenablingtimelycontent downloads.

WeextendpreviousworkontheMistrustfulP2Pmodelbydiscussingitslegalandethicalframework, assessingitsfeasibilityformoreusecases,providingasecurityanalysis,comparingitagainstatraditional P2Pﬁlesharingmodel,andfurtherdeﬁningandimprovingitsmainmechanisms.

1. Introduction

In the last two decades, the advances in computer technol- ogyhavedrasticallychangedthemedialandscape.Thewidespread availability of digital media and broadband Internet connections enabled consumersto become also producers anddistributors of creative work, something that inthe past was mostlylimited to professional parties. Peer-to-Peer (P2P) networks endowed indi- viduals with the means to easily and eﬃciently distribute digital media over the Internet, and are extensively used for large- scale ﬁle sharing due to their decentralized andscalable nature.

Nevertheless, asmoreinformation ﬂows throughthese networks, usersarebecomingincreasinglyconcernedabouttheirprivacy.The P2P architecture enables solutions that enhance privacy, such as Freenet[1],Nymble[2],andTor[3],butuserlegalliabilityissues may be raised asit also facilitates unauthorized distribution and reproductionofcopyrightedmaterial[4].

∗ Corresponding author.

E-mail addresses: [email protected] (P. Silva), [email protected] (J. Dias), [email protected] (M. Ricardo).

Thereasonsbehindseekingprivacymaybevarious,suchasto avoiduserproﬁling,trackinganddatamining, toprivately download legal contents that may be embarrassing or objectionable froma political, religiousorsocial point-of-view,orto download unethical, illegal orincriminating contents withoutbeing subject toanyliability. Thecurrentstate ofcomputer technologyenables organizationsandindividualstocreatedatabasesofpersonalinfor- mation that were previously impossibleto set up,and swapthis information,sellitoruseitinanyotherwayasacommodity[5]: organizationsandindividualsareabletocollectinformationabout users, combine facts from separate sources, and merge them to createsuch databasesofpersonalinformation.Usersmayfeelex- posedorembarrassedifitbecomespublicthattheyhadaccessto contentssuch asto helpdealingwithalcoholanddrug abuse,to help dealing withanger management, orconsidered heretical or profane.Evenaccessingillegalorincriminatingcontentsmayhave anethicalmotivationbecauselawsare,ideally,drawnfromethics, butthat is not always the case:e.g., autocratic regimes consider illegaltoshareorevenaccesscontentsthattheyconsiderinappro- priateorpotentiallyharmful.

Copyright is a legal device that protectsthe expression of an ideabyconferringthecreator,foraperiodoftime,exclusiverights http://dx.doi.org/10.1016/j.comnet.2017.04.005

(2)

to publish,sell and control the reproduction of his work, whom may grant or sell these rights to others. Copyright infringement andpiracyarethenviolationsofoneoftheseexclusiverights[6]. However,theserightsaregranted nationwide,notworldwide:the BerneConvention[7]isnotratifiedby allcountriesandonlysets the minimum standards for copyright. Therefore, given that P2P usersare spread all over the world anda single content sharing maycross manyjurisdictions, it is hard to determine the appli- cablelegal frameworkandtheextentofuserliability. Evenwhen thelegalframeworkcanbeclearlydefined,itmaystillbedifficult todeterminetheextentofuserliabilitysincethecopyrightrights mayconflictwithotherinterestsandrights,beingsubjecttopro- portionality assessment to balance all the rights andinterests at stake[8].

Traditional P2P file sharing systems focus on performance andscalability, disregarding anyprivacy or legal issues that may arise from their use. These systems take advantage of the large number of interconnected peers¹, and their idle resources, to moreefficientlydistributecontents at thecost ofrequiringpeers to publicly advertise what they download, making it trivial to identifyusercontentinterests. Thisproblemisfurtheraggravated bythefactthatpeersforminterest-basedcommunities,andevery single connection presents an opportunity for a malicious peer to passively obtain additional information that may enable the identification of user content interests, with high certainty, by monitoringjustasmallfractionofthenetwork[9].

Several privacy-preserving P2P systems have been proposed, but, given that there is a trade-off between privacy and performance, they consider differentattack models and employ differ- enttechniquesforprivacypreservation. Themajorityofthe solu- tionsemployeithertechniquestoprovideanonymityortechniques toprovideplausibledeniability. Techniquesto provideanonymity, such as onion routing [10] and information slicing [11], are usu- allystronger but havelower performance. Techniques to provide plausibledeniability,suchasrequestrelaying– peersrelayrequests tocreateuncertainty aboutcommunicating endpoints– and con- tentinterestdisguise– peersdownloadadditionalcontentstohide their real interests –, are usually weaker buthave better performance.Despitetheirdifferences,allP2Pfilesharingsystemsshare one common issue: they require peers to advertise, either fully orpartially,what they download.Lackingalternatives,users have adopted anonymity systems for P2P file sharing [12], misunder- standing the privacy guarantees provided by such systems [13], in particular when relaying traffic of insecure applications [14]. Anonymity systems provide a channel to anonymously transmit messages,buttheuser’sidentity maybedisclosedby thecontent ofthat messages,whicharethesoleresponsibilityoftheapplica- tion.

The Mistrustful P2P model [15] provides plausible deniability through content interest disguise, as in other privacy-preserving solutions,butithasnotrustrequirements,preventsuserlegalli- abilityincaseoflegitimate usage,andensuresdeterministic pro- tectionof usercontent interests against attacksofa size up toa conﬁguredlevel.Itisbuilton thebasis ofmistrustingall theen- titiesparticipatingintheP2Pnetwork,henceitsname,andthere- foreusersarenotrequiredtoestablishtrustlinksinordertopar- ticipateinthecontentsharing.TheMistrustfulP2Pmodelenables eachusertosettherequiredtrade-off betweenprivacyandperfor- mancebyconﬁguring,percontent,thesizeofthelargestgroupof colludingpeerstobeprotectedagainst,andtheminimumnetwork overheadofcontentinterest disguise(minimumnetworkdisguise overhead).

1We use the term peer to refer to the network node, and the term user to refer to the person.

Inthispaperweextendtheworkpresentedin[15]asfollows:

we discussthelegal andethical frameworkto pinpointthe main issuesandchallenges,andtoprovidesomeinsightregardinguser liability;furtherassessthefeasibilityoftheMistrustfulP2Pmodel byconsideringasetof84usecases;provideasecurityanalysisfor commonattacks;compareourmodelagainstatraditionalP2P file sharing modelto betterestimate theimpact ofprivacy preservation;furtherdefineittogetclosertoenablerealimplementations, whileimproving itsmain mechanismsto betteradapt tothe P2P filesharingdynamicsandtotheuserprivacyrequirements.

The remainder of this paper is structured as follows.

Section 2 depicts the main legal and ethical challenges at stake along with user liability considerations. Section 3 brieﬂy describes traditional P2P ﬁle sharing systems. Section 4 character- izes theproblemwe aim tosolve. The relatedwork ispresented inSection5.Section6describesthelatestversionoftheMistrust- fulP2Pmodel.Sections7and8provide,respectively,thesecurity analysisandtheevaluationofourmodel.The resultsanddiscus- sionarepresentedinSection9.Section10drawsthemainconclu- sions.

2. Legalandethicalframework

Computer ethicsis a field ofstudy that addresses the ethical challengesofcomputertechnologybutseverallinesofthoughtex- ist. Therefore, in this section our aim is to describe the ethical challengesthatweidentifiedbeingrelevanttoourwork.Werefer thereaderto[16]and[17]fora wideand thoroughexplanation of thesechallenges. Forthe legalandprivacydimensionsofP2P file sharing,basedontheintellectualpropertylaw,inparticularonthe copyrightlaw,weattempttohighlightthekeylegalaspectstotake intoconsiderationonWesterncountriesregardingdirectandindi- rectuserliability.

Lawsareformallyadoptedrulesthatmandateorprohibitacer- tainbehavior, createdby themembersofasocietytobalancethe individualrightstoself-determinationagainsttheneedsoftheso- cietyasawhole,and,ideally,aredrawnfromethics,whichdeﬁne whatisconsideredrightorwrong,i.e., thesocially acceptablebe- haviors.Thekeydifferencebetweenlawsandethicsisthatthefor- mercarry theauthorityofagoverning body,usually anation[5]. Inturn,ethicsarebasedonculturalmores,beliefs,valuesandprin- ciples,whichreﬂecttheuniqueexistentialexperiencesthatweac- cumulateasindividualsaswell associeties and,supportedonin- stitutions, provide long-term stablerules that are madeobvious.

Thus,theserules canbeseenasrefractionsofthecommonworld awareness that give rise to different experiences and interpreta- tions:multiculturalethics[18].

The cultural differences, despite some ethical standardsbeing universal(e.g.,murder andtheft),makeitdifficulttodefine what isethicalornot.Studieshaveshownthat theperspectiveoneth- ical practices ofindividuals regarding the use of computer tech- nology differs withtheirnationality. Asian traditions ofcollective ownership conflict with Western protection of intellectual property, and many of the ways the former use software is consid- eredsoftwarepiracybythelatter[5].Thesedifferingperspectives are alsoevident on the control overthe Internet content andon the surveillance made by governments: they radically differ be- tweencountries[19].Accordingtoareport[20]fromtheReporters Without Borders, the Great Firewall of China is getting “taller”, theUnitedKingdom isthe“world championofsurveillance”,and

“NSA²symbolizesintelligenceservices’abuses”;ontheotherhand, Finland,Netherlands,andNorwayareatthetopoftheWorldPress FreedomIndex2016[21].

2United States National Security Agency.

(3)

The utmost objective of intellectual property law is to promote progressbyencouragingandstimulatinghumanintellectual creativity andbroaddissemination ofits result [22]. Creatorsare granted exclusive rights asan incentiveto continuetheir works, and,atthesametime,thesocietycanbeneﬁtfromtheirbroaddis- semination.However,giventhattheenforcementofsuchexclusive rightsontheprivatesphereofusersmayclashwiththeirprivacy rights,theprivatecopyinglaw,whenpresent,introducesanexcep- tion to these exclusive rights undersome conditions.For private useandforendsthat areneither directlynorindirectlycommer- cial,anaturalperson³isallowedtocopycopyrightedworksonthe condition that the rightholders receive faircompensation. There- fore, anymedia that may be used for the reproduction of copy- righted works by consumers are designated for payment of the private copying levy, which willcompensate rightholdersfor any harm that maybecaused [23,24].The privatecopying law,when present,differsconsiderablybetweencountries[25],reﬂectingthe lackofconsensus andthecomplexityofthissubject. Moreover,it isnotclearthatprivatecopyingalways causesharm– e.g.,itmay increase thegroupoffansandenablesuserstotrybeforebuying –,andnotallmedia areusedforthereproductionofcopyrighted works.Wereferthereaderto[22,24,26]forfurtherdiscussionon thesubject.

P2Pnetworksconnectusersallovertheworldwithoutconsid- eringeitherinternationalbordersorculture,therebyasinglecon- tentsharingmaycrossmanyjurisdictions.Thesenetworksareusu- allyconnotedwithcopyrightinfringement becauseitisestimated that,overthecourseofamonth,96.3%ofusersofBitTorrentpor- talshavedownloadedatleastoneinfringingcontent[27].Itisnot feasibleto enforcetheexclusiverightsprovidedby copyrightlaw, ifpresent,inevery singlejurisdictionwhileensuringthattheydo notconflictwiththerightsandinterestsofusers.Asso,inan at- tempttomitigatetheimpactofcopyrightinfringement,righthold- ersarenowtryingtoblockwebsitessuchasThePirateBayinstead oftryingtobringenduserstocourt[28]because,ingeneral, itis moreefficient andlessonerous toprove that theownersofsuch websites profit from the copyright infringement. Therefore, such websitesaresubjecttoindirectliabilityastheyinduce,contribute to or fail to prevent direct copyright infringement. Nevertheless, this does not mean that sharing copyrightinfringing contents is legal.Onthecontrary,usersmaybesubjecttocivilandpotentially criminalliability [24].Theextent ofsuch liabilitydependsonthe applicable legalframeworkandontheintent ofsuchacts:a user mayalsobesubjecttoindirectliability.

The multitude and complexity of copyright laws across the globemakeitimpossibletodeﬁneclearboundariesregardinguser liability.Still,itisourbeliefthattheusersofP2Psystemswillnot beheldlegallyliablefor,unknowinglyandunwillingly,download- inganillegalcontent(directliability)orcontributetoitsdownload (indirectliability) ifthemainmotivationtousetheP2P systemis tosharelegitcontents,anditcannotbe proventhattheuserhad access, either inpartor fully,to the contentdata. Theuser does notbeneﬁtdirectlyfromthedownloadofanillegalcontentthathe hasnoaccessto,andisalsounabletodeterminethat thecontent isunexpectedly illegalbeforehaving accesstoitsdata.Therefore, it is plausible that heeither hasbeen mislead into downloading it or hadno intentionin assisting a misbehaving peer infringing copyrightlaw.

Insum, the basicethicalimperativesare thata personshould not, knowingly and willingly, cooperate in or contribute to the wrongdoing of another, and that the human intellectual creativity needs to be encouraged and stimulated in order to promote

3In jurisprudence, a natural person is a human being, as opposed to a legally generated juridical person.

SEEDER LEECHER

Owns

(all chunks) (downloaded chunks)

Shares

(all chunks) (downloaded chunks)

Downloads (does not download)

(missing chunks) Owned data chunk Missing data chunk

Fig. 1. Peer roles and sharing behavior on traditional P2P systems. A seeder has all data chunks and just shares them; a leecher is a peer still downloading missing data chunks and sharing the data chunks it has already downloaded.

progress.The extent of theseincentives depends on the cultural background as the well-being of the society may be incentive enough (collective ownership) or further incentives may be required(intellectual property rights).P2P networksintroduce new challengestothe privatecopying levysystem, andthe legislation onthissubjectisexpectedtochangeinthenearfuturetoaddress them.Users ofP2Psystemsshouldnotbesubjecttoanyciviland criminal liability asa resultof legitimate usage ofthe system if themainmotivation touseit istosharelegit contents, andthey havenoaccesstothecontentdata.Theintentisimportanttode- terminetheextentofuserliability,especiallyifhisactionsdirectly orindirectlycausedprovableharm.

3. TraditionalP2Pﬁlesharingsystems

TraditionalP2P filesharingsystems focusonperformance and scalability,disregardinganyprivacyandlegalissuesthatmayarise from their use. In this section we provide an overview of these systems,andpresenttheir mainprivacy andlegalissues.We use BitTorrentas an example given that it is the mostprominent of suchsystems.Westartby providingabriefdefinitionofcommon BitTorrent terminology – hash, content, chunk, torrent file, peer, seeder,leecher, swarm, andtracker –, describe its content sharing process,andendpresentingthemainprivacyandlegalissues.

Ahashisanalphanumericstringusedtoidentifyandtoverify the integrity of the data being transferred, usually an SHA-1 di- gest (hexadecimalstring). A contentis composed ofone or more files, identified by the hashof its data,and partitionedinto sev- eralpieces.Achunk isone ofthosedatapieces, andatorrentfile providesthecontent’smetadata.Apeerisanodesharingchunks, which, fora given content, can be either a seeder – a peerthat has all data chunks and is justsharing them – or a leecher – a peerstill downloadingthecontent andsharingthe chunksit has alreadydownloaded. Fig.1 depictsthepeerroles andtheir shar- ingbehavior.Aswarmisthesetofpeerssharingthesamecontent (peers forminterest-basedgroups tosharecontents), andisusu- allyidentifiedthroughthehashofthecontent.Atrackerisacen- tralnode that provides listsofpeers inswarmsby keepingtrack ofwhichpeersare sharingwhich contentsandtheir role (seeder orleecher)oneachcontent.

Traditional P2P systems take advantage of the large number of interconnected peers, and their idle resources, to more eﬃ- ciently distributecontents; their main performance metric is the average download time. E.g., BitTorrentprotocol employs several mechanisms for chunk and peer selection, such as rarest ﬁrst

(4)

A A

Tracker 1. Register

S L A

mrawS eht nioJ

reeP tceleS .3

S

L L

L

4. Synchronize Owned Chunks

5. Transfer Chunk SWARMX

S Seeder L Leecher Swarm Tracker

A Unregistered Peer A A Leecher A A Seeder A

Fig. 2. Overview of the content download process on traditional P2P systems. First, peer A registers at the tracker to join the swarm of content X (swarm X). After registering, the peer starts as a leecher (still downloading) and requests from the tracker a list of peers (a subset) sharing that content. Until download completion, peer A selects peers from that list, synchronizes the chunks it owns with theirs, and transfers those available, if any, that it misses; the list of peers may be updated during content download. Upon download completion, peer A notiﬁes the tracker to be known as a seeder.

mechanism – locally rarest chunks are downloaded ﬁrst –, and optimisticunchoking– periodicallyselectarandompeer– aiming atconstantlyimprovingthedownloadbitrate[29].Thedistinction betweenseedersandleechersis used toestimate relative download times (through their ratio), and also to quickly assess the contentavailability⁴,whichisassuredifatleastasingleseederis present.Thestepsrequiredtodownloadagivencontent,depicted in Fig. 2, can be summarized as follows: (1) a peer willing to download a given content registers at the tracker and joins the swarm;(2)itrequestsalistofpeers(asubset)intheswarmfrom the tracker; (3) it selects peers from that list to obtain missing chunks inorderto complete itsdownload; (4)peers synchronize thechunkstheyownandmisstodetermineifanymissingchunks canbetransferred;(5)ifselectedpeersownmissingchunks,those chunks get transferred;(6) upon downloadcompletion, the peer notiﬁesthetrackertobeknownasaseederforthatcontent.

ThemainprivacyissuesoftraditionalP2Psystemsarethatthey publiclydisclose user content interests, andprovide a proof that theuserisabletoaccessacontentinpartorentirely;theformer disclosesan intention whilethe latterprovides a proof of its re- alization.Apeeronlydownloadsthecontents thatauserisinter- estedin,therefore,byregisteringatatrackerandjoiningswarms, theusercontentinterests arebeingpubliclydisclosed.Peers provide a proof that the useris able to accessa content inpart or entirelybyadvertisingwhichchunkstheyown,andentirelybyno- tifyingthetrackerupondownloadcompletion.

ThemainlegalissueoftraditionalP2Psystemsisthat,unknowinglyandunwillingly,a usermaybe heldliable forcopyrightin- fringement dueto illegal content download.If a content is pub- lishedwithamisleadingdescriptionoriftheresourceusedtoob- tainthehashofthecontentisinsidious,theuserwillonlybecome awareofthatfactafteraccessingthecontentdatainpartorfully.

Nevertheless,forcontentsthatcanbeaccessedinpartbeforefully downloadingthem,which isusually the caseofmultimedia contents,theusermaybe infringingcopyrightlawafterdownloading asingleora fewchunks.The copyrightinfringementcanbe triv-

4A content is available if it can be fully downloaded.

ially provedbecause peersadvertisewhichchunks they ownand notifythetrackerupondownloadcompletion.

4. Problemdeﬁnition

Inthe context ofthiswork, wedeﬁne privacy preservationas theconcealmentofusercontentinterests.Weaimatdevelopinga privacy-preservingP2Pﬁlesharingmodelthat:

Hidesusercontentinterests. Theusermustbeabletohidehiscon- tentinterestsfromanyparticipantinthesystem:trackers,regular peers,orgroupsofcolludingpeers.Protectionagainstexternalen- titiesmonitoringalltheuser’straﬃc,suchasISPsorgovernments, isoutofthescopeofthiswork.

Has no trustrequirements. The user must be ableto download a content without having to trust anyone. Therefore, the privacy- preservingP2Psystemmustenablecontentsharinginlargegroups ofuntrustedpeers(untrustedP2Pnetworks).

Prevents user liability. The usershallnot be subjectto anyliabil- ity as a result of legitimate usage of the privacy-preserving P2P system. The user shall be protectedagainst actions ofmisbehav- ingnodes,orfromthedownloadofcontentspublishedwithmis- leadingdescription (misleading contents),which maydrive users tounknowinglyandunwillinglydownloadunethicalorillegalcon- tents.

Enablestimelydownloads. Theusershouldbe abletodownloada contentinduetime.Theaveragedownloadbitratemustbewithin the same order of magnitudeof traditional P2P systems that do notpreservetheprivacyofusers.

5. Relatedwork

Severalprivacy-enhancingP2P systems havebeenproposed in the literature for a wide range of applications, providing differ- entdegreesofprivacytousersandemployingvarioustechniques, the majority of which provides privacy preservation through anonymity or through plausible deniability. Freenet [1] and Tor[3]are probablythemostprominentanonymity solutionsfor, respectively, anonymous content distribution networks and low- latency anonymity.Despitetheir merits, anonymity systems tend tointroducemoreoverhead,andtheirusersmaybeheldindirectly liable.Forinstance,Tordefaultstoapathlengthoffour(300%network overhead), which lowers the throughput and increases the average latency [30], anda Tor relay node (core router) mayre- laytraffic ofmisbehaving peers,whichmakesthelast oneonthe path(exitnode)toappeartobetheoriginatorofsuchtraffic.Sys- temssuchasOneSwarm[31],whichrelyontrustlinkstoprovide privacy,arenotconsideredgiventhattheyarenotsuitableforun- trustedP2Pnetworks.Herein,wedepicttheprivacy-enhancingP2P systems providing plausible deniability and designed specifically forP2Pfilesharing.

The description provided for each privacy-enhancing P2P system focus on the trust requirements, on the protection against bothpassiveandactiveattacksaimingatidentifyingusercontent interests,andonthepotentiallegalliabilityofusers.

BitBlender[30]providesplausibledeniabilitybyintroducingre- laypeersthatsimplyproxyrequestsonbehalfofotherpeers.Peers willing toact asrelay peers canregister ata centralnode called blender,and,oncerequested,willjoinaP2Pswarminaprobabilis- tic wayso that they cannot be distinguished fromregular peers.

The joining probability of relay peers is deﬁned by the blender, when asking registered peers to join a P2P swarm, so that the

(5)

Table 1

Comparison of privacy-preserving P2P ﬁle sharing systems regarding trust requirements, protection against passive and active attacks aiming at identifying user content interests, and potential user legal liability.

Protects Against Attacks

Work Has No Trust Requirements Passive Active Prevents Legal Liability

BitBlender ✗ ✗ ✗

SwarmScreen ✗ ✗

The BitTorrent Anonymity Marketplace ✗

Petrocco et al. ✗ ✗

set of relay peers remains unknown while having the cardinal- ity requested by the tracker. It enables the identiﬁcation of reg- ularandrelaypeersthroughdownloadprogresstracking,andpro- vides a trivial proof of content download because peers advertisewhat they download.Thereby,BitBlenderprovides protection against passive attacks, but it is vulnerable to active attacks us- ingdownloadprogresstracking.Itrequiresuserstotrustboththe tracker and the blender. Its users may be subject to legal liabil- ityfordownloadingmisleadingcontents,andforrelayingtraﬃcof misbehavingpeers.

SwarmScreen [9] provides plausible deniability by obscuring user content interests through content interest disguise.The de- vised scheme, whichconsists in “addinga small percentage(between 25% and 50%) of additional random connections that are statisticallyindistinguishablefromnaturalones”,thwartsguilt-by- associationattacks,thatis,attacksinwhichtheusercontentinter- ests can be inferred with highcertainty just by classifyingpeers based on the behavior of the communities they participate in.

SwarmScreenpresentsnotrustrequirements,butitsattackmodel onlyconsiderspassiveattacks.Itisvulnerabletoactiveattacksbe- cause contents can be distinguished through download progress tracking. Peers communicate through direct links, butusers may besubjecttolegalliabilityduetomisleadingcontentdownload.

The BitTorrent Anonymity Marketplace [32] follows Swarm- Screen’s approach to provide plausible deniability. It does not present any trust requirements, and peers also communicate through direct links. However, in order to protect against both passive and active attacks, all contents are fully downloaded to makethemindistinguishable,giventhatanattackerisabletofully trackdownloadprogress:peersadvertisewhatthey ownormiss.

Theauthorsdeﬁnek-anonymityastheprivacyprotectionlevelob- tainedfromfullydownloadingkcontents.Thus,sinceitintroduces highnetworkoverhead,it eitherpreventsdownloads fromtimely completingorconstrainstheachievablelevelofprivacyprotection.

Usersmaybesubjecttolegalliabilityduetothedownloadofmis- leadingcontents.

Petrocco et al. [33], following SwarmScreen’s approach, proposed a system that aims at hiding user content interests without compromising timely downloadcompletion. Their systemre- lies on private swarms, request relaying, caching, and partial advertisement ofdownloaded chunks.As statedby theauthors,private swarms are required to ensure a good level of privacy, i.e., toavoidtheidentiﬁcationofusercontentintereststhroughdown- loadprogresstracking.Yet,toobtainthecredentialsneededtojoin a privateswarm, peersmusttrustone ormoreparticipants.Also, asonlyafractionofthechunks areadvertised,itisnotclearhow acontentsharingisbootstrappedwithfewseedersorhowrequest relayshould operateduringperiodsofcontentunavailability.This systemprovides protectionagainstpassiveandactive attacks,but itsusersmaybeheldlegallyliableduetorelaytraﬃc.

Table 1 summarizesand compares the privacy-preserving P2P ﬁlesharingsystemsdescribedabovetakingintoconsiderationthe trust requirements,theprivacyprotection againstpassiveandac- tive attacks, and the potential legal liability of users. BitBlender and SwarmScreenare unable to hide user contentinterests from

active attacks using download progress tracking. The BitTorrent Anonymity Marketplace andPetrocco et al.’s systemsare able to thwartbothpassiveandactiveattacks.BitTorrentAnonymityMar- ketplacerequirespeers tofullydownloadall contentsinorder to make them indistinguishable, and therefore introduces a consid- erablenetworkoverheadthatwilleitherprevent downloadsfrom timelycompletingorconstraintheachievablelevelofprivacypro- tection.Petroccoetal.’ssystemrequirespeerstotrustoneormore participantsinordertoreducenetworkoverheadby usingpartial advertisementofdownloadedchunks.

All solutions require peers to advertise, either fully or par- tially,what they ownormiss. Anattackermayexploit download progresstrackingtoobtaina proofthat theuserisabletoaccess a content either entirelyor in part, andto narrow or even void plausibledeniability. Forunethical orillegal contents that canbe accessedin partbeforefullydownloading them,which isusually thecase ofmultimedia contents, an attackermayobtain a proof ofsuch access.As so,theusermaybe held legallyliablefor,unknowinglyandunwillingly,downloadingasingleorafewchunks, beitduetotraﬃc relayingorduetomisleadingcontentdescrip- tion. Thereby, a legitimate usage of these systems may hold the userliableforcopyrightinfringement.

6. MistrustfulP2Pmodel

In this section we describe the latest version of our model, whichisbuiltupontheassumptiondescribedinSection 2:legiti- mateusersshouldnotbesubjecttoanylegalliabilityifthesystem ismainlyusedtosharelegitcontentsandthereisnoaccesstothe contentdata(noproofofaccess).Wereferthereaderto[15]fora priorversion.

This section is structured as follows. Section 6.1 provides an overviewofourmodelanditsmainbuildingblocks– contentin- terestdisguiseandmistrustfulsharing– tobestdescribehowthe problemwe aim to solve is addressed. Section 6.2discusses the peerrolesandtheirsharingbehavior,thecontentsharingprocess, andtheroleofmistrustfulsharingonit.Theattackmodelconsid- eredisdeﬁnedinSection6.3.Sections6.4through6.8characterize theinstantiationsofeachoneofthemechanismsusedontheeval- uationofourmodel,whosemain focusisonits feasibilityrather thanonitsoverallperformance.

6.1. Overview

TheMistrustfulP2P modelisbuiltontheconcept ofmistrust- ing all the entities participating in the P2P network, hence its name,andthereforeusers arenot requiredto establishanytrust linksinordertoparticipateinthecontentsharing.Itreliesontwo main building blocks – content interest disguise and mistrustful sharing– andaimsathidingusercontentinterests throughplau- sibledeniability,inuntrustedP2Pnetworks,whileovercomingthe mainlimitationsofakinP2Pﬁlesharingsystems.Theselimitations canbesummarizedasfollows:(1)peersare requiredtoadvertise what they download enabling passive attacks; (2) protection againstactive attacksis onlyachievedby introducingeithertrust

(6)

requirements or considerable network overhead; (3) the privacy protectionagainstboth passiveandactiveattacks isprobabilistic;

(4) legitimate users may be held legally liable for, unknowingly andunwillingly,downloadingorrelayingtraﬃcofillegalcontents.

Content interest disguise, as in other privacy-preserving P2P systems,hidesusercontentintereststhroughplausibledeniability.

The usercontent interests are hidden by downloading both con- tentsthat theuser isinterestedin (genuine)andadditional contents of no interest to the user (cover), as long as they cannot bedistinguished. Registeringata trackerandjoininga swarmno longerrepresentsinterestinthat content,andusercontentinter- estscannolongerbeidentiﬁedbymonitoringjustasmallfraction ofthenetwork[9].Acontentinterestdisguiseschemeselectsthe setofcovercontents,howmuchofeachonetodownload,andmay impose constraints to the content sharing in order to hide user contentinterests. The evaluationof the Mistrustful P2P model is centered on the feasibility of its novelty, the mistrustful sharing buildingblock,andtherebytheproposalofacontentinterestdis- guiseschemeisoutofthescopeofthiswork.

Themistrustfulsharingbuildingblockenablestheusertocon- ﬁguretherequiredtrade-off betweenprivacyandperformanceby deﬁningthe size c ofthelargest colluding groupto be protected against and the minimum amount m of chunks that are to be downloaded per covercontent (minimum network disguiseover- head),wherec≤m<kforanycontentpartitionedintokchunks sothatthereisnoproofoffullcontentdownload(proofofdown- load).Proof of access is then avoided by encoding contents in a waythatonlyenablesdecodingafterfulldownload.Themistrust- fulsharingbuildingblockiscomposedoftwocoremechanisms– erasurecoding and disclosureconstraint –, andthree supporting mechanisms– blockselection,requestbackoff,andpeerselection.

ItenablestheMistrustfulP2Pmodeltoovercomelimitations(1)to (4)asfollows.

Peersavoidadvertisingwhattheydownloadbyrequestingfrom otherpeersrandom chunks,thus defeatingpassive attacksofany size;theblockdownloadprocessandthemessagesexchangedare describedinSection6.2.Attackershavethentoengageinthecon- tent sharing because the chunks owned ormissing are only im- plicitlydisclosedtootherpeerswhilesharing.Activeattacks,ofa size up to c, are defeatedby constraining the amount ofchunks disclosedtoanysetofcpeerstobe,atmost,m:anattackerisnot ableto distinguish covercontents fromgenuine ones because at leastmchunks aredownloaded ofeach content.As so,usercon- tentinterestsaredeterministicallyhidden.Forlegitimateusers,le- galliabilityispreventedby notrequiringpeersto relaytraﬃcon behalf ofother peers, by neverfully downloading covercontents (theirdataisneveraccessible),andby avoidingbothproof ofac- cessandproof ofdownload foranyattack ofa size up to c.The protectionofusercontentinterests andtheuser liabilityare dis- cussedinmoredetailinSection7,whichalsodescribesthecoun- termeasuresemployedagainstcommonattacks.

The erasurecodingmechanismisthecoremechanismusedto enabletheprobability ofrandomly retrievinga chunk tobecome signiﬁcant,andto enabledecoding onlyafterfulldownload.Con- sidering a content divided into k chunks, uniformly distributed acrossthenetwork,the probabilityofretrievingthe lastchunk is just1/k.The erasurecodingmechanismgenerates a setofnera- surecoded chunks (blocks),from a set of k chunks, so that any subset of k blocks enables to retrieve the content, where k= k(¹+

(^k))^and

⁽^k⁾îs^theêrasure^codingôverhead.Îf^theⁿ^blocks

arealsouniformlydistributedacross thenetwork, theprobability ofretrievingthelastrequiredblockincreasesto1−(^k−1)/n.As anexample,fork=20,n=5·k=100,and

(^k)=0(optimalera- surecode),theprobabilityincreasesfrom5%to81%.

The disclosure constraint mechanism is the core mechanism used to ensure that no more than m blocks are disclosed (re-

quested orshared) toany setof c peers (largest colluding group considered),andthuspreventsproofofdownload(m< k).Italso contributesto thereduction ofthe overheaddueto coverdown- loads because only m blocks need to be downloaded per cover contentinorderto avoidthe identiﬁcationofgenuine downloads amongcoverdownloads.Thismechanismenablesthedisclosureof atleastoneblocktoeachpeer(m≥c),and,onaverage,m/cblocks canberequestedfromeachpeer.Reducingtheamountofavailable block requestsbyeitherincreasingc ordecreasingmmayimpact theoverallperformance.

The remaining three mechanisms – block selection, request backoff, and peer selection – are deﬁned to enable the evalua- tion ofourmodel,andto ensure thatcontents are timely downloaded.Theblock selectionmechanismdetermines whichblockis tobeoffered toagivenrequestingpeer,affectingthedistribution ofblocksamongpeersandthereforetheprobabilityofretrievinga usefulblock(innovativeblock).Therequestbackoff mechanismde- terminesthe delaybetweenblockrequests aimingatmaximizing theamount ofusefulblocks thatcan be obtainedfromthe avail- ableblockrequestsintheshortesttimeframe.Withthesameaim, thepeerselectionmechanismselectsa peertowhichablock re- questwillbesent.

The instantiation provided for each mechanism, described in Sections 6.4 through 6.8, is the one used to evaluate our model inSection 8.Due to thelarge numberofvariables andfactorsat play,weconsidertheimpactofthosethatareexpectedtochange more often – c andm, content size, peerarrival rate, and number of seeders–, and ofcover downloads on the average down- loadbitrate andon the download completion ratio.For the sake ofclarityandtractability,otherfactorsandvariablessuchasinter- contentrelations,incentivestoshare,parallel chunkrequests,and Internetconnection heterogeneityarenot considered.Theevalua- tionaimsatdemonstratingthefeasibilityofourmodelratherthan atoptimizingitsoverallperformance,i.e.,itaimsatdemonstrating thatpeersareabletotimelydownloadcontentswithoutadvertis- ingwhattheydownload.

In sum, the mistrustful sharing building block reinforces the content interest disguise because the distinction between gen- uineandcovercontents ishardenedbynotdisclosingwhatpeers downloadormiss,andalargersetofcovercontentscan beused astheydonotneedtobefullydownloaded.Itpreventslegitimate usersfrombeingheldliable dueto covercontentandmisleading contentdownloads.Cover contentsarenever fullydownloaded to guaranteethattheuserhasneveraccesstotheircontentdata.Mis- leadingcontentsmaybefullydownloaded,butthereisnoproofof accessorproofofdownloadforanyattackofasizeuptoc. 6.2. Peerrolesandcontentsharing

Peers,percontent,cantakeoneoftworolesdependingontheir privacyrequirementsandthewaytheycontributetotheﬁleshar- ing:seeder– peerhavingacontentthatwantstoshare,andwilling to forgo its privacy –, orcommoner – peerwilling to participate in the content sharing ifits privacy requirements can be met.A seedermaybetheauthororapartyinterestedinpublishingacon- tent,andthereforedoesnotrequiretheconcealmentofitscontent interests. It generates a unique block (erasure coded chunk) for each requestit receives, andonly refusesto serve block requests ifit has no resources available. Onthe other hand,a commoner doesnotgeneratenewblocks,onlysharesthemifitsinterestsre- main hidden, and onlyhas accessto the content data afterfully downloadingthecontent.Acommonerkeepstrackoftheblocksit shareswithotherpeersbothforprivacyprotectionandtoavoidof- feringablocktwicetothesamepeer.Itmayrefusetoserveblock requestsifithasnousefulblockstooffer,duetoresourceorpri- vacyconstraints,orduetocontentinterestdisguisestrategies.The

(7)

SEEDER COMMONER

Owns

(all chunks) (downloaded blocks)

Shares

(generated blocks) (downloaded blocks)

Downloads (does not download)

(enough useful blocks)

…

Owned data chunk Missing block (erasure)

Block (erasure coded chunk)

Fig. 3. Peer roles and sharing behavior on the Mistrustful P2P model. A seeder has all data chunks and generates a new block (erasure coded chunk) for each incoming request; a commoner is a peer that may be still downloading enough blocks to reconstruct the content data and shares the blocks it has already downloaded. As so, a commoner never shares data chunks and only has access to the content data after fully downloading the content.

r e d i v o r P r

e t s e u q e

RPeerA) (PeerB) (

Request(any)

−−−−−−−−−−−→

Offer(x)

←−−−−−−−− (if not refused) Accept(x)or Cancel(x)

−−−−−−−−−−−−−−−−−−−→

T ransfer(x)

←−−−−−−−−−− (on acceptance) Fig. 4. Messages exchanged during block download process. Peer A requests a random block from peer B, which either refuses the request or offers a block with id x . If the block is accepted by peer A, it will get transferred; otherwise, no transfer occurs to avoid unnecessary network overhead.

commoner neverdiscloses thereasonbehindrefusal. Fig.3 sum- marizesthepeerrolesandtheirsharingbehavior.

Theblock downloadprocessontheMistrustfulP2P modeldif- fers considerablyfromtheone onother P2P ﬁlesharingsystems, giventhat peers donot advertisewhat they download.This process is summarizedin Fig. 4,where peer Ais the requester and peer B is the provider, and works as follows. Peer A requests a random block from peer B, i.e., without providing the id of an intended block. Peer B then either refuses therequest by simply ignoring it, andthus not disclosingthe reasonbehind refusal, or replieswiththeidoftheblockitiswillingtoshare,whichmustbe differentfromanyotherthat mayhavebeenpreviouslydisclosed betweenthem (both asrequester andasprovider). Ifpeer Bhas offeredablock,peerAsendsareplymessageeitheracceptingthe offeredblock,inordertostartitsdownload,orcancelingtheblock request, in orderto avoidunnecessary network overhead.Unless theblockrequesthasbeencanceled,peerBsendstheblockithas offered to peer A. The possible outcomes of a block request are furtherdiscussedinSection6.6.

The content download process of the Mistrustful P2P model, depicted in Fig. 5, consists in the following steps: (1) the peer registers at the tracker to join the swarms of both genuine and cover contents in order to disguise the content interests of the user, without disclosingits role; (2)it requests a listof peers (a subset)intheswarmfromthetracker,whichisnotawareofeach peer’s role;(3) it selects peersfrom that list that cope withthe userprivacy requirements(eligiblepeers);(4)itrequests random blocksfromthosepeers inordertocompleteits download;(5)if

A

Tracker 1. Register

S C A

mrawS eht nioJ

SWARMX

reeP tceleS .3

S

C C

C

SWARMC1 SWARMCZ

SWARMC0

S Seeder C Commoner Swarm Tracker

4. Request a Random Block

A Unregistered Peer A A Commoner A 5. Transfer Useful Block

Fig. 5. Overview of the content download process on the Mistrustful P2P model.

First, peer A registers at the tracker to join the swarm of content X (swarm X) without disclosing what blocks it owns or misses; it also joins swarms C 0through C zto disguise user content interests. Then, it requests a list of peers (a subset) in the swarm from the tracker. Until download completion, and as long as the privacy requirements are met, peer A selects eligible peers in the swarm and requests them random blocks. To prevent unnecessary network overhead, only offered blocks that are useful get transferred; otherwise, the block requests are canceled. The list of peers may be updated during content download, and there is no notiﬁcation upon download completion.

A

S

reeP tceleS .3

C EC

DC RB

PS

DC

S Seeder C Commoner Swarm A Commoner A

EC Erasure Coding RBRequest Backoff BS Block Selection PS Peer Selection DCDisclosure Constraint

BS

EC

Data chunk Block Missing block 4. Request a Random Block

5. Transfer Useful Block

Fig. 6. Mistrustful P2P model mechanisms and their role in the content sharing.

The erasure coding mechanism is used by a seeder to generate new blocks, and by a commoner to retrieve the content data after fully downloading a content. The disclosure constraint mechanism determines if a block request can be sent to a peer or accepted by a given commoner. The block selection mechanism determines which block is to be shared, if any; the same block is never offered twice to the same peer.

The peer selection mechanism selects an eligible peer to which a block request can be sent. The request backoff mechanism deﬁnes the delay between block requests.

theselectedpeersofferusefulblocks,thoseblocksgettransferred;

otherwise,the requests are canceled andno transfer occurs. The listofpeersmaybeupdated duringcontentdownload,andsteps (3),(4)and(5)are repeateduntil downloadcompletion (genuine contents)or untilat leastm blocks have beentransferred (cover contents).Thereisnonotiﬁcationfromthecommonerupondown- loadcompletion.

Thescopeofactionofthemistrustfulsharingmechanismsen- tailsmostlysteps(3),(4)and(5).Theirroleonthe contentshar- ingisillustratedinFig.6.Theerasurecodingmechanismenablesa commonertoretrievethecontentdataafterfullydownloadingthe codedcontent, andenablesa seeder togeneratea newblock for

(8)

each incoming request. The disclosure constraintmechanism de- terminesifa block request canbe sent to a peeroracceptedby agivencommoner.The blockselectionmechanismisusedbythe contactedcommonertodeterminewhichblock istobeofferedto therequestingpeer,ifany.Therequestbackoff mechanismdeter- minesthedelaybetweenblockrequestsofacommoner.Thepeer selectionmechanismselectsthepeertowhichtheblockrequestis sent.

6.3.Attackmodel

We assume that an attackermight be anyentity that partici- pates inthe system – apublisher, a tracker, a regular peer, ora groupofcolluding peers.Externalentitiesmonitoringalltraﬃc of apeer,such asISPs orgovernments,are out ofthe scope ofthis work.

Beingaparticipantofthesystem,weconsiderthatanattacker isable to engage inthe content sharing as a commoneror asa seeder, to be a tracker, and to publish contents with misleading description.Also, we consider that an attacker may coordinatea largenumberofpeersthat colludewitheach other(collusion attack) orassume multiple pseudonymousidentities (Sybil attack), whichisequivalenttoalargercolludinggroup.

The creation of large number of pseudonymous identities (Sybil attack) was ﬁrst considered by Douceur [34], and is one of the most dangerous attacks that plague P2P networks [35]. Douceur[34] showed that, without trusted identity certiﬁcation, Sybil attacks are always possible when considering realistic sce- narios.In untrusted P2P networks, unlike collusion attacks,Sybil attacks can be mitigated without explicit information either by performing resource testing or by applying recurring costs and fees[35].We referthereaderto[36] fora surveyonexistingap- proaches.

In thecontext ofour work,collusion andSybilattacks aimat increasingtheamountofblocksdisclosedbyotherpeerstoasin- gle entity or to colluding entities, so that either content down- loadcan be proven or usercontent interests can be determined.

TheMistrustfulP2Pmodelprovidesprivacyprotectionagainstsuch attacksof a size up to c peers (colluding group),be it colluding peers,Sybilpeers,oracombinationofboth.Itrequiresc≤m<k, andtherefore mandk areupperboundsforc.Ontheone hand, despiteincreasingtheminimumnetworkdisguiseoverhead,mcan beincreasedasneeded(uptok)giventhatitisuserconﬁgurable.

Ontheotherhand,kisdeﬁnedbythepublisherofthecontentand cannotbechanged.Thereby,inordertoextendtheprivacyprotec- tionwithoutincreasingthesizeofthelargestcolludinggroupcon- sidered(c),theusermayconﬁgureassingleentitiesallthesetsof peersthatheconsiders,orsuspects,tobecolludingortobeSybil peers.

Peers are identified by their public IP addresses because we considerthatIPaddressesprovideamoreflexibleresourcetesting thatcan be madehardertoacquire thanother resources such as humantime,networkbandwidth,computationalpowerorstorage capacity,whichcannotberelated.Forprivacyprotectionpurposes, asetofpeerswhoseIPaddressesareconsideredtoberelatedcan be treated asa singlepeer (IP address aggregation), or multiple peers sharing a single IP address can be treated individually by consideringthe (IP,port) pairasthe identifier(IP address multi- plexing).Theuser isthen abletoconfigure howIP addressesare treatedforprivacyprotectionpurposes,providingtheflexibilityto goaslowastreatingall(IP,port)pairsasuniqueidentities,e.g.all peersbehindNAT⁵,uptotreatingallIPaddressesofagivenentity, colludingentities,city,countryoranyothersetofIPaddressesasa

5Network Address Translation.

singlepeer.Thisalsoenablestheusertoconﬁgurethesetofrules that are applied to peers using anonymous systems such asTor, whichcanbeIPaddressaggregationrules,IPaddressmultiplexing rulesoranycombinationofboth.

6.4. Erasurecoding

An erasure code generates a set of n symbols from a set of k symbols, k < n, so that any subset of k(¹+

(^k)) ^is ^enough to reconstructtheoriginal information,where

⁽^k⁾ ^is ^the^erasure

codingoverhead.Erasurecodes areusually classiﬁedaccordingto threeorthogonalproperties:systematicity,rateﬁxedness,andcod- ing overhead.An erasurecode issystematic iftheinput symbols areembeddedintooutputsymbols,andnon-systematicotherwise.

If n is static and needs to be known before encoding, the era- surecodeisﬁxed-rate.Ifncan bedynamically increasedandthe amount of symbols that can be generated does not impose any practicallimitation,theerasurecodeisrateless.Finally,anerasure code is said Maximum Distance Separable (MDS) if any k symbols out of n are enough to reconstruct the original information [

(^k)=0],ornon-MDS ifadditionalsymbolsare required[

⁽^k⁾ >

0].Non-MDS erasurecodesreduce signiﬁcantly theencoding and decodingtimecomplexityordersbyintroducingcodingoverhead.

The erasure coding mechanism supports both MDS and non- MDSrateless erasurecodes,butrequiresthe contentto beeither codedorencryptedinawaythatonlyenablesdecodingafterfull download.Forevaluationpurposes,weconsiderscenariosinwhich theMDSerasurecodes aremoresuitable, e.g.those inwhichthe networkisthemostconstrainedresource,andthuserasurecoding overheaddoesnotrepresentoneadditionalvariablethat needsto be considered [

(^k)=0]. We refer the reader to [37] fora rate- lessMDSconstruction ofReed-Solomon codesthat we developed for our model. These erasure codes are defined over the finite field Fp² with⁽ⁿ^log^k⁾ êncoding ând^min^{ⁿ^log^n,^k^log²^k^}^decod- ing timecomplexities, wherepisa Mersenneprime (^p=2^q−1) andn≤2^q⁺¹.TheirperformancewasevaluatedoverF₍2³¹−1)²,son

≤2³²,anddoesnotimposeanyconstraintstothecontentsharing.

Theyalsoenabledecodingonlyafterfullcontentdownload.

6.5. Disclosureconstraint

Thedisclosureconstraintmechanismenablestheusertoconﬁg- ure,per content,the requiredtrade-off between privacyandper- formance by setting the size c of the largest colluding group to be protectedagainst,andthe minimumamountm ofblocks that needtobedownloadedpercovercontent.Thesizecofthelargest potentialattackerisdeﬁnedbythenumberofuniquepeers(unre- latedpublicIPaddresses)thatarecontrolledbyasinglegroup,ei- therasingleattackeroragroupofcolludingattackers.Thismech- anism ensures that coverandgenuine content downloadscannot be distinguished by tracking their downloadprogress because at mostmblocksaredisclosedtoanysetofcpeers,andthatnoma- licious peercan prove that a userdownloaded a content orhad accesstoitsdata(m<k).

Finding the maximum intersection betweenthe set of blocks disclosedtoanysetofcpeersisanNP-hardproblem[38],thuswe devisedaconservativeyeteﬃcientalgorithmtoevaluatedynami- callythenumberofblocksthatcanstillbesharedwithapeer.The algorithm is divided intotwo main functions: one to update the counterofblocksdisclosedtoapeer(Algorithm1– UpdateBlocks Disclosed),andthe othertodetermine thenumberofblocksthat can still bedisclosed toa peer(Algorithm2– Blocksto Disclose Left).ThevariablescommonersandblocksDisclosedarerespectively anarray sortedbythenumberofblocksdisclosed,andthemaxi- mumnumberofblocksdisclosedtoanysetofcpeers.

(9)

Algorithm1 UpdateBlocksDisclosed.

functionIncrementBlocksDisclosed(id) i←commoners.getIndex

(

^id

)

ifin

v

alidIndex

(

ⁱ

)

^then ^New.

commoners.push

(

^id

)

commoners.last.blks←1 i←commoners.getIndex

(

^id

)

else Known.

commoners[i].blks←commoners[i].blks+1 j←i−1

while

v

alidIndex

(

^j

)

^do ^j≥0 blksI←commoners[i].blks

blksJ←commoners[j].blks

if blksI>blksJthen Stillunsorted.

swap

(

^commoners^[ⁱ^]^,^commoners^[^j]

)

i← j

j← j−1

else Sorted.

break end if end while endif

ifi<c then Changesontopcpeers.

blocksDisclosed←blocksDisclosed+1 endif

endfunction

Algorithm2 BlockstoDiscloseLeft.

functionBlocksDiscloseLeft(id)

ifc>mor m≥kthen Invalid.

return0 endif

top←min

(

c,commoners.length

)

^Top^peers.

le ft←m−blocksDisclosed−

(

^c−top

)

i←commoners.getIndex

(

^id

)

ifin

v

alidIndex

(

ⁱ

)

^then ^New.

if commoners.length≥cthen

le ft←le ft+commoners[c−1].blks else

le ft←le ft+1 end if

else Known.

if i≥cthen

le ft←le ft+commoners[c−1].blks le ft←le ft−commoners[i].blks end if

endif returnle ft endfunction

Algorithm1 receivesasinput theid ofthepeerto whichone additional block was disclosed. If none has yet been disclosed, a new entry is created; otherwise, the entry is updated and, if needed, some elementsare swappedto keep thearray sorted.In eachcase,iftheupdatedentryisononeofthetopcpositions,the maximumnumberofblocksdisclosedisupdated.Algorithm1has lineartimecomplexity.

Algorithm2alsoreceivesasinputtheidofthepeer.Ifthecon- ﬁgured privacyrequirementsare not met(invalid), noblockscan be disclosed toany peer.If they are met,left contains thenum- berofblocksthatcanstillbedisclosed,ensuringthatatleastone block can be disclosed to each one of the top c peers; atmost, m−(^c−1)^blocks^can ^be ^disclosed^to^a ^single^peer. ^left^needs ^to be updated ifthere arealready atleastc peers andthe peerre- ferredbyidisoutsideofthatset.Algorithm2runsinlogarithmic time.

6.6.Blockselection

The block selection mechanism is used by commoners to de- terminewhichblockistobeoffered toa requestingpeer.Itplays animportantroleonhowtheblocksendupdistributedacrossthe network,affectingtheprobabilityofpeersobtainingusefulblocks.

Thismechanismensuresthatnouploadedblockisofferedtwiceto thesamepeer, anddetermineswhen requestsshould be refused.

Therequestrefusalmaybeduetothelackofusefulblockstooffer, duetoresource orprivacy constraints,ordueto content interest disguisestrategies.

Withtheaimofbalancingthedistributionofblocksacrossthe network,eachpeerattributesaweighttoeachblockitowns. The weight w_i of each block is updated according to the perception of the peer about its availability, which is based on the accep- tanceorcancellationoftheofferedblocks.Theblockstoofferare pickedthrough arandom weightedselection, andrecentlydown- loaded blocks start with a weight w_s. When a block is offered, if it is accepted, the weight is updated using an additive factor, w_λ;otherwise,theweightisupdatedusingamultiplicativefactor, w_η.Therefore,toensurethattheweightdecreaseonacceptanceis nevergreater than the one oncancellation, the weight update is givenbyEq.(1).

wi=

max

1,wi−w_μ

, ifitisaccepted max

1,

wi w_η

, otherwise wherew_μ=min wi−

wi

w_η

,w_λ

⁽¹⁾

Withthe MistrustfulP2P model there isno need tosuddenly terminate orremove downloads nor to stop sharing because the provided protection does not depend on the time a peer keeps sharing a content, as long as cover and genuine downloads are treatedthesameway.Toevaluateourmodel,weconsideredws= 100, w_λ=5, and w_η=2 with the goal of favoring more recent blocks.

6.7.Requestbackoff

Therequestbackoff mechanismdetermines thedelaybetween block requests to help maximizing the amount of useful blocks thatcanbeobtainedfromtheavailableblockrequestsintheshort- esttimeframe.Themechanismidentiﬁesthesetofpeerstowhich blockrequestscanbesent(eligiblepeers),anddeterminesforhow long no block requests should be sent. Therefore, asthe former is a direct result of individual peer behavior and the latter depends on the swarm behavior, we deﬁne the backoff time as a two-dimensionalvariable that hasper peerandper swarm components.Thepeerbackoff componentprovidesthedelaytoreturn apeertothe setofeligiblepeers; theswarm backoff component providesthedelayuntilthenextblockrequest.Theactualbackoff time israndomly generatedwithin theinterval [0,b], whereb is thecalculatedbackoff time.

A block request has ﬁve possible outcomes: (1) refusal – the requestisrefusedbythecontactedpeer;(2)cancellation– there- questiscanceledbytherequester(duplicateblock);(3)acceptance

(10)

O^UTCOME Peer A (Requester)

Peer B (Provider) Refusal

Cancellaon

Acceptance

Interrupon

Disposal

X

X X

Request Offer

Cancel Transfer

Accept

X

Fig. 7. Possible outcomes of a block request: refusal – the request is refused by peer B (no offer); cancellation – the request is canceled by peer A (no transfer);

acceptance – the request is accepted and the block is transferred from peer B to peer A ; interruption – the request is accepted but the block transfer is interrupted;

disposal – no request is sent.

– the request is accepted anda block is downloaded; (4) interruption– therequestisacceptedbutthedownloadisinterrupted;

(5)disposal – norequestissentduetothelackofeligible peers.

Refusaland disposal discloseno information about block ownership,butalltheothersdo.Cancellationandacceptancerevealthat both peers alreadyown that block; interruptionreveals that the contactedpeerownsthatblock.Fig.7illustrateseachpossibleout- come.

Consider uandvtobe respectivelythenumberofconsecutive refusedrequestsandthenumberofconsecutiverequeststhatwere eithercanceledorinterrupted,andu_iandv_ithesamevariablesbut fora peeri; by consecutivewemean untilarequest isaccepted.

Let

μ

^,

α

^,

λ

^,

β

^,

η

^,^and

τ

^¯ ^berespectivelythemaximumbackoff time, thebaselinearbackoff time,thelinearbackoff timefactor,thebase exponentialbackoff time,theexponentialbackoff time factor,and the estimatedaverage block transfer time. The backoff time b is thengivenbyEq.(2).

b=min

μ

,

α

+

λ

^u+

β η

^v−1

(2) Forblockrequeststhatdonotdiscloseblockinformation,u,itis appliedalinearincrease,

λ

^;^for^block^requests ^that^disclose^block

information,v,itisappliedanexponentialincrease,

η

^.^The^backoff

time never exceeds

μ

^.^The ^peer ^and^the ^swarm ^components ^are

expressedusing Eq.(2),butthe values oftheir variables maybe different.Thus,thevariablesofeachcomponentaredistinguished byapsubscript(peer)andanssubscript(swarm).Fortheswarm component only, when there are no eligible peers

∀

i,bp_i>0

, bs=min

λ

^p,min

bp_i

.

Forthe sakeofclarity andtractability,the evaluationassumes no simultaneous block requests, homogeneous Internet connections,andasinglecontentdownload.Therefore,inordertoevalu- ateourmodel,we considered

α

p=100 ms,

β

p=

τ

¯/4,

λ

p=

τ

¯/10,

η

^p=2,and

μ

^p=^k_m^τ^¯^c forthepeercomponent.Theperpeerback- off timeis a function of theblock transfer time, andis instanti-

Table 2

Summary of countermeasures employed to prevent common attacks.

Attack Countermeasure(s)

DoS Blacklist misbehaving peers (at application level).

Sybil and Collusion

Avoid block advertisement; employ the disclosure constraint mechanism; treat multiple peers using related IP addresses as pseudonymous identities of a single entity.

Peer Selection Use multiple unrelated trackers for the same content.

Forgery and Repetition

Peers communicate through direct links.

Pollution Each block has a checksum signed by the publisher.

ated such that the block requests sent to the samepeer are, on average,50msapartandsuchthat,onaverage,m/cblockrequests canbesenttoeachpeerduringcontentdownload.m/crepresents the conﬁgured protectionand k

τ

¯ ^the^minimum time required to downloadagivencontent.Fortheswarmcomponent, weconsid- ered

α

^s=0,

β

^s=

τ

¯/4

ρ

,

λ

^s=

τ

¯/16

ρ

,

η

^p=2,and

μ

^s=

τ

¯,where

ρ

isthenumberofactivepeers.Theswarmbackoff timeisafunction ofbothblocktransfertimeandthenumberofactivepeers,which canbeobtainedfromthetracker(s).Giventhattheevaluationas- sumesnosimultaneousblockrequests,thereisnominimumdelay requiredbetweenconsecutiverequests,whichcanbe,atmost,the timerequiredtodownloadasingleblock.

6.8. Peerselection

The peer selection mechanism aims at selecting the eligible peer that has thehighest probability of providing a usefulblock in lesstime. The set of eligible peers isconstrained both by the disclosureconstraintandrequestbackoff mechanisms.Theformer provides,foragivencontent,thesetofpeerstowhichnofurther block requests can be sent to; the latter provides, for the same content, theset ofpeers that are ineligible forthe moment,and whencanthenextblockrequestbesenttoeachoneofthem.

Forthesakeofclarityandtractability,peerswereselectedran- domlyinordertoavoidaddinganadditionalfactorintotheeval- uation. Non-uniform selection of peers is expected to impact on howthe blocksendup distributedacross thenetwork, especially whenconsideringheterogeneousInternet connections:peerswith more available bandwidth are able to replicate more rapidly the blockstheyown.

7. Securityanalysis

Thissectionprovides asecurityanalysisofthe MistrustfulP2P model.WestartbydescribingthecommonattacksinP2Psystems, andpresentthecountermeasuresemployed.Then, wediscussthe identiﬁcation of user content interests, the proof of full content download,anduserlegalliability.

7.1. Commonattacksandcountermeasures

This section describes the common attacks in P2P ﬁle shar- ingandthecountermeasuresemployed,whicharesummarizedin Table 2, focusing on the context of our work. The attack model consideredassumesthat an attackercan beanyentityparticipat- inginthesystem– apublisher,atracker,aseederoracommoner – ora group ofthoseentities incollusion.Externalentities, such asISPsandgovernments,areoutofthescopeofthiswork.

DoS. A denial-of-service attack aims eitheratshutting down the entire systemor at making one or more contents unavailable. It maytargettrackerstopreventpeersfromknowingeachotherand thereforedisablecontentsharing,ortargetseederstopreventnew