Machine learning over encrypted data = Aprendizagem de máquina sobre dados cifrados

(1)

Universidade Estadual de Campinas Instituto de Computação

INSTITUTO DE COMPUTAÇÃO

Hilder Vitor Lima Pereira

Machine Learning over Encrypted Data

Aprendizagem de máquina sobre dados cifrados

CAMPINAS

2016

(2)

Hilder Vitor Lima Pereira

Machine Learning over Encrypted Data

Aprendizagem de máquina sobre dados cifrados

Dissertação apresentada ao Instituto de Computação da Universidade Estadual de Campinas como parte dos requisitos para a obtenção do título de Mestre em Ciência da Computação.

Thesis presented to the Institute of Computing of the University of Campinas in partial fulfillment of the requirements for the degree of Master in Computer Science.

Supervisor/Orientador: Prof. Dr. Diego de Freitas Aranha

Este exemplar corresponde à versão final da Dissertação defendida por Hilder Vitor Lima Pereira e orientada pelo Prof. Dr. Diego de Freitas Aranha.

CAMPINAS

2016

(3)

Agência(s) de fomento e nº(s) de processo(s): Não se aplica.

Ficha catalográfica

Universidade Estadual de Campinas

Biblioteca do Instituto de Matemática, Estatística e Computação Científica Ana Regina Machado - CRB 8/5467

Pereira, Hilder Vitor Lima,

P414m PerMachine learning over encrypted data / Hilder Vitor Lima Pereira. – Campinas, SP : [s.n.], 2016.

PerOrientador: Diego de Freitas Aranha.

PerDissertação (mestrado) – Universidade Estadual de Campinas, Instituto de Computação.

Per1. Aprendizado do computador. 2. Criptografia. 3. Criptografia homomórfica. I. Aranha, Diego de Freitas,1982-. II. Universidade Estadual de Campinas. Instituto de Computação. III. Título.

Informações para Biblioteca Digital

Título em outro idioma: Aprendizagem de máquina sobre dados cifrados Palavras-chave em inglês:

Machine learning Cryptography

Homomorphic encryption

Área de concentração: Ciência da Computação Titulação: Mestre em Ciência da Computação Banca examinadora:

Diego de Freitas Aranha [Orientador] Marco Aurelio Amaral Henriques Siome Klein Goldenstein

Data de defesa: 23-09-2016

Programa de Pós-Graduação: Ciência da Computação

(4)

Universidade Estadual de Campinas Instituto de Computação

INSTITUTO DE COMPUTAÇÃO

Hilder Vitor Lima Pereira

Machine Learning over Encrypted Data

Aprendizagem de máquina sobre dados cifrados

Banca Examinadora:

• Prof. Dr. Diego de Freitas Aranha IC/UNICAMP

• Prof. Dr. Marco Aurelio Amaral Henriques FEEC/UNICAMP

• Prof. Dr. Siome Klein Goldenstein IC/UNICAMP

A ata da defesa com as respectivas assinaturas dos membros da banca encontra-se no processo de vida acadêmica do aluno.

(5)

Acknowledgements

I would like to thank the laboratories LASCA and LMCAD and their members for receiv-ing me and providreceiv-ing resources to perform my research. I am also grateful to Google for the grants they awarded me. They were very important to give me comfortable conditions to study and participate on conferences and workshops.

I certainly have a lot of professors to thank, but two of them deserve special acknowl-edgements: first, professor Diego Aranha, my advisor, for using his experience to guide me in a way that allowed me to obtain good results, by all his suggestions and recommen-dations, and also for being a pleasant person whom it is easy and nice to work with; the second one is professor Alfredo Goldman, whom I have to thank for being a great person and for helping me infinitely. Without him, I would not be finishing my Masters now.

And the most heartful acknowledgements are for my mother, Ana Lícia Lima Pereira, and for my father, Hilmar Rodrigues Pereira, who despite all the socioeconomic pressure and difficulties, were able to cleverly see the opportunities that the formal education could give me and to artfully supply me the means to begin my undergraduate course. I will be eternally grateful to you both.

(6)

Resumo

Aprendizado de máquina normalmente exige que grandes quantidades de dados sensíveis sejam compartilhados, o que é notoriamente intrusivo em termos de privacidade. Sendo assim, terceirizar a computação de tais algoritmos para a nuvem requer que o servidor seja confiável, o que introduz um pressuposto de segurança não-realista e com alto risco de abusos ou violações. Mas se a nuvem fosse capaz de realizar cálculos sobre dados cifrados, ao invés de computar sobre dados às claras, como de costume, os clientes poderiam então cifrar seus dados antes de enviá-los à nuvem, minimizando as suposições de segurança feitas sobre o servidor.

Neste trabalho, são propostas versões que preservam a privacidade do classificador k-NN e do algoritmo para Análise de Componentes Principais que podem ser executadas sobre dados cifrados. Os algoritmos para Análise de Componentes Principais são baseados em esquemas de criptografia homomórfica, enquanto as versões do k-NN (uniforme e pon-derado) apresentadas aqui combinam criptografia que preserva a ordem com criptografia homomórfica.

Os resultados experimentais mostram que as variantes privativas do k-NN alcançam a mesma precisão que o classificador k-NN convencional, mas com impactos consideráveis no desempenho. No entanto, essa penalidade no tempo de execução não significa que as soluções não podem ser usadas na prática quando as propriedades de segurança fornecidas pela abordagem são examinadas em detalhe.

Os algoritmos propostos para a Análise de Componentes Principais têm profundidade multiplicativa linear no número de componentes a serem encontradas, mas mesmo assim, os tempos de execução ainda são proibitivos, principalmente devido à falta de técnicas eficientes para codificar os valores de dupla precisão dos conjuntos de dados para textos claros do esquema homomórfico utilizado na execução.

Para ambos os algoritmos propostos, o servidor em nuvem não precisa ser confiável para além da correta execução do protocolo (modelo semi-honesto) e pode avaliar os algoritmos sobre dados cifrados, não aprendendo assim os valores do conjunto de dados original nem as respostas devolvidas, já que as saídas de algoritmos homomórficos são também cifradas.

(7)

Abstract

Machine learning tasks typically require large amounts of sensitive data to be shared, which is notoriously intrusive in terms of privacy. Outsourcing this computation to the cloud requires the server to be trusted, introducing a non-realistic security assumption and high risk of abuse or data breaches. But if the cloud were able to perform computations over encrypted data instead of plaintext data as usual, clients could then encrypt their data before sending it to the cloud, minimizing the security assumptions made about the server.

In this work, privacy-preserving versions of Principal Component Analysis (PCA) and k-NN classifier that can be evaluated over ciphertexts by a cloud server are proposed. Principal component analysis is based on homomorphic encryption schemes while the k-NN classifier combines homomorphic encryption with order-preserving encryption.

Experimental results show that the privacy-preserving variants of k-NN achieve the same accuracy as the conventional k-NN classifiers, but considerably impacts the original performance. The performance penalty is however still viable for practical use when the security properties provided by the approach are examined in detail.

The proposed algorithms for Principal Component Analysis have multiplicative depth linear in the number of principal components to be found, but even so, the running times are still prohibitive due to a lack of efficient techniques to encode the double values of the datasets into the plaintexts of the homomorphic scheme used in the implementation.

For both the proposed algorithms, the cloud server does not need to be trusted beyond correct execution of the protocol (semi-honest model) and computes the algorithms over encrypted data, never learning the responses output by them or the real dataset values.

(8)

List of Figures

3.1 Diagonal, bidiagonal, and tridiagonal matrices. . . 19

3.2 Circuit with multiplicative depth equal to 3 . . . 29

3.3 Circuit with multiplicative depth equal to 2 . . . 29

4.1 Data expansion . . . 47

(9)

List of Tables

3.1 Table of additions of the ring _Z3. . . 25

3.2 Table of products of the ring _Z3. . . 25

4.1 Datasets used in the evaluation. . . 44

4.2 Accuracies of unweighted k-NN. . . 45

4.3 Accuracies of weighted k-NN. . . 45

4.4 Running times to classify a single instance. . . 46

4.5 Comparison of running times in seconds to encrypt the datasets. . . 46

5.1 Execution times for datasets with N samples of P variables. . . 58

5.2 Angle between PCs found and expected PCs. . . 58

(10)

Chapter 1 Introduction

Cloud computing has become the dominant paradigm, as several large companies such as Amazon, Google and Microsoft invest in the area. Beyond storage and backup services, outsourced computation has been standing out. For illustration, there are already a lot of solutions offering Machine Learning as a Service, such as Amazon Machine Learning, Ersatz Labs, Google Predction API, and Microsoft Azure Machine Learning, to cite a few. Those solutions appeal to convenience: users are not required to maintain a large infrastructure, with several machines and an on-site technical staff, to analyze their data. They just have to submit the data to the cloud and get back the answers, that can be models, classification, regression, among others.

As an example, consider the scenario where a hospital stores a large database con-taining diagnosis information about tumors found in patients: size, shape, age, and if they are benign or malignant neoplasms. After collecting this information from a new patient, the hospital wants to classify a tumor. In such case, the hospital could send that database to some machine learning service in the cloud to generate a model and then send the information about that new patient. The cloud would classify the tumor based in the model and return the classification to the hospital.

Despite the low operating costs, high availability of storage capacity and computa-tional power, “one of the barriers to adoption of cloud services is concern over the privacy and confidentiality of the data being handled by the cloud, and the commercial value of that data or the regulations protecting the handling of sensitive data” [21]. With corpora-tions and governments becoming more intrusive in their data collection and surveillance efforts, and the recurrent data breaches observed in the last years, the cloud paradigm faces a challenge to remain as the computing model of choice for privacy-sensitive ap-plications. In particular, there are no formal guarantees that the cloud provider is not behaving in abusive or intrusive ways, or even that the infrastructure is protected against external attacks. Different legal regimes and governmental influence introduce further complications to the problem and may shift responsibilities in unclear ways. The long-term financial impact from the current crisis in cloud provider trust is estimated between 35 and 180 billion dollars in 2016 in the USA only [33].

A possible way to address the privacy concern consists in computing over encrypted data, a model in which the users encrypt their data before sending it to the cloud and all the outsourced computations are done without decrypting the data, which generates

(13)

CHAPTER 1. INTRODUCTION 13

results that are also encrypted, and therefore, only understandable by the clients after decryption. This is performed by encrypting the data with schemes that conserve part of the plaintext structure and allows further execution of certain operations. Since the encryption schemes employed are secure (aside from leaking information about the plain-texts necessary for performing computation), the cloud (and anyone that manage to break into the servers) can not learn what the original data was.

However, the privacy gain does not come for free. In general, there is a trade-off between security and efficiency. The more secure the solution is, the less information the cloud learns about the data but the longer it will take to be processed.

There is also a trade-off between efficiency and functionality. Using fully homomorphic encryption (FHE), it would be possible to evaluate arbitrary functions over encrypted data. In particular, the cloud could execute any machine learning task in a privacy-preserving way. The first FHE scheme was proposed in 2009 by Craig Gentry [19]. This represents a breakthrough work because it showed that it is possible to construct FHE schemes, but from a practical point of view, it did not represent an immediate solution to the problem of computing over encrypted data yet, because FHE is still very unprac-tical [28].

Instead of using FHE, more limited cryptographic schemes may be used in order to obtain improved efficiency, for instance schemes that are homomorphic regarding only one operation or schemes that are homomorphic for two operations, but with limitations on the number of operations that can be performed, like Somewhat Homomorphic Encryption (SHE) or Leveled Homomorphic Encryption (LHE) schemes. The problem of using such non-fully homomorphic schemes is that since they impose restrictions on the number or types of operations that can be performed, the functions to be evaluated have to be redesigned to meet those restrictions.

For example, an additively homomorphic encryption scheme allows us to add two ciphertexts in a meaningful way, that is, generating a third ciphertext that decrypts to the addition of the corresponding plaintexts. Therefore, to homomorphically evaluate an algorithm using a scheme of this type, we would have to write all the operations in function of additions, for instance, by changing multiplications of the type n · x by n − 1 additions of x to itself. Of course, it is impossible to represent any computable function using only additions, therefore, it is impossible to homomorphically evaluate any function using additively homomorphic encryption.

SHE or LHE schemes are homomorphic regarding additions and multiplications, but only for a limited number of them. Using these schemes makes possible to homomor-phically perform several additions, but just a few multiplications, which means it is still necessary to remodel algorithms to be evaluated homomorphically, but at least they are far more efficient than FHE in terms of computational costs.

A growing research area is thus dedicated to study how to adapt algorithms to work correctly in the encrypted domain under the restrictions imposed by such non fully ho-momorphic schemes. Authors of [21] showed general techniques that can be used to adapt algorithms to make them computable over encrypted data using LHE and applied those techniques to create privacy-preserving versions of the Linear Means Classifier and Fisher’s Linear Discriminant Classifier. Others authors have attempted to create

(14)

homo-CHAPTER 1. INTRODUCTION 14

morphic versions of different algorithms [41, 15] or improve the efficiency of the existing ones in order to evaluate them over bigger datasets [40].

1.1 Objectives

The main objective of this work is to provide a good understanding about homomorphic encryption and other related schemes that have special properties in addition to the traditional secrey/confidentiality guarantees, and about how such schemes may be used to construct homomorphic versions of machine learning algorithms that can be evaluated in a privacy-preserving way by a third part, following the model of outsourced computation. In particular, that knowledge is applied to develop privacy-preserving versions of two machine learning algorithms that had not yet homomorphic versions.

1.2 Contributions

To achieve our objective, privacy-preserving cloud-based versions of two well-known ma-chine learning algorithms were developed and implemented: the k-NN classifier and Prin-cipal Component Analysis (PCA). Both can be evaluated over encrypted data by a cloud service provider using SHE of LHE schemes, then, matching the Machine Learning as a Service model. The user needs only to encrypt the data once and send it encrypted to the server, where computation is performed and an encrypted response is returned from afterwards. As far as we know, this is the first time those algorithms are implemented over encrypted data considering the cloud scenario of outsourced computation.

For the k-NN classifier, we designed both unweighted and distance-weighted versions that present good performance overall and almost the same accuracy of the conventional versions.

The usual ways to perform Principal Component Analysis were extensively studied and the difficulty in adapting them to be evaluated homomorphically was demonstrated, thus an ad-hoc version was created. This version has a low multiplicative depth, which means it is suitable to be evaluated using SHE or LHE schemes. However, the actual lack of good encoding techniques to encode double values into the plaintext spaces of such encryption schemes impacts the running times of the proof-of-concept implementation presented here. In order to minimize such impact, an interactive version of the privacy-preserving Principal Component Analysis was also formulated.

The servers are supposed to follow the semi-honest model, also known as honest-but-curious model, which means the server will follow the specified protocol performing all the calculations correctly and returning the right answer to the clients, but can store all the data provide by the clients to do additional computations afterwards in order to obtain extra information.

Under this model, a server executing the privacy-preserving variants of k-NN only learns the number of instances and the number of variables, that is, the size of the dataset, but does not learn the values of any instance’s attribute nor the number of classes. The classes assigned after the classification is finished are not learned as well. To the Principal

(15)

CHAPTER 1. INTRODUCTION 15

Component Analysis, the same holds, which means the server only learns the size of the dataset submitted to it.

The work developed here yielded two papers, namely Principal Component Analysis over encrypted data and Non-interactive privacy-preserving k-NN. The first one was pub-lished first as an extended abstract at SBSeg 2015 (Simpósio Brasileiro em Segurança da Informação e de Sistemas Computacionais) and also presented at the WHEAT 2016 (Workshop Homomorphic Encryption Applications and Technology, Paris). The second one, about k-NN, will be submitted to international conferences.

1.3 Document Organization

The document is organized as follows. Chapter 3 presents some fundamental concepts required to understand the solutions developed here, like definitions and theorems from Abstract and Linear Algebra, and the cryptographic primitives employed. Related works from the scientific literature are also discussed in this chapter.

Chapter 4 discuss the k-NN formulation and presents our proposed privacy-preserving versions to both the uniform and the distance-weighted variants, presenting practical experimental results. Chapter 5, the next, follows the same structure, discussing some applications of PCA computation and presenting our proposed privacy-preserving version, together with the experimental results.

Conclusions are presented in the last chapter, after which we list the references used in this work.

(16)

Chapter 2 Related Works

The authors of [28] discussed the practical limitations of FHE schemes and showed that SHE is an alternative to the problem of computing over encrypted data. As a proof-of-concept, they implemented simple functions homomorphically: an average of n integers, a standard deviation, and a logistical regression.

Following that work, in the paper [21], LHE was used to evaluate the binary clas-sifiers Linear Means ans Fisher’s Linear Discriminant over encrypted data. That was done by encoding real number using a scale technique and by eliminating the divisions present in the original algorithms. The authors also briefly mentioned that PCA could be solved homomorphically by gradient descent, but did not provide an actual construction. Bost et al. [8] studied the classification problem using modified versions of more complex classifiers, such as decision trees and simple Bayesian inference for medical applications. As a result, the authors constructed privacy-preserving classifiers, but incurring in high communication cost due to consecutive communication rounds. Other works dealt with the problem of large-scale statistical analysis [40] for linear regression and other useful metrics for machine learning tasks; or the design of interactive protocols for clustering over encrypted data [23], such as applying the classical k-means algorithm [32] for two-party execution.

More recently, neural networks were considered in this scenario of computing over en-crypted data. First they were studied from a theoretical point of view on [41], where the authors concluded that making inferences homomorphically, over a encrypted instance, would be possible, but the learning stage (where the model is generated) might be infea-sible because it is necessary to use high degree polynomials to approximate the functions used in this stage. Then, LHE was used to evaluate an already trained neural network over encrypted data [15]. The authors managed to obtain an accuracy of 99% and a good throughput but did not address the problem of evaluating the learning stage.

Several works in the literature already studied problems related to privacy-preserving k-NN classification. However, solutions were provided for different scenarios that involved distributed processing between agents with equivalent computing power; or for simpler versions of the problem, only requiring computation of the k nearest neighbors and ignor-ing the classification step.

The authors of [45] considered a scenario known as vertically partitioned data, where each of several parties holds a set of attributes of the same instances and they want to

(17)

CHAPTER 2. RELATED WORKS 17

perform the k-NN classifier on the concatenation of their datasets. An interactive privacy-preserving protocol was proposed in which the parties have to compute the distances between the instances in their own partition and a query vector; and combine those distances using an additively homomorphic encryption scheme with random perturbation techniques to find the k nearest neighbours. The classification step is finally performed locally by each party.

In [42, 43] the authors assumed that several data owners, each one with a private database, would collaborate by executing a distributed protocol to perform privacy-preserving k-NN classification. The classification of a new instance was performed by each user in his or her own database and then a secure distributed protocol was used to classify the instance based on the k nearest neighbors of each database, without revealing those neighbors to the other data owners. It means that the query vector is revealed and the process is interactive, with heavy processing load for each involved party.

In the article [10], the authors presented three methods to find the k nearest neighbors preserving the privacy of the data, but they did not address the classification problem. Furthermore, the three methods were interactive. It is worth noting that even if finding the k nearest neighbors is the main step involved in k-NN classification, this is not compatible with a cloud computing scenario, implying that the client has to store at least a table relating the vectors on the dataset and their classification, and also that the query vector must be locally classified after the client receives the k nearest neighbors.

The work [16] considered a different scenario: the data owner encrypts the data and submits them to a first server, sending the secret key to a second server. Thereby, any authorized person is able to send a query vector to the first server, which runs a distributed protocol with the second server (this sever may decrypt some data in this process), and finally the first server returns the k nearest neighbors. Notice that again, the classification is not performed, because the authors were just interested in finding the nearest neighbors. Also, even if the client does not have to process the data, that method requires a trusted server to store the private key, and this trusted server acts as the client in the distributed processing scenario. Relying on a trusted third party naturally introduces additional substantial risk.

Another approach was proposed in [39], where a new cryptographic scheme called asymmetric scalar-product-preserving encryption (ASPE) was also proposed. The scheme preserves a special type of scalar product, allowing the the k nearest vectors to be found without requiring an interactive process. The scheme allows the server to calculate inner products between dataset vectors by calculating the inner product of encrypted vectors, determining the vectors that are closer to the query vector. However, the authors were again only concerned with the task of finding the nearest neighbors, not with the classifi-cation problem. Also, a cryptographic scheme created ad hoc for this task lacks extensive security analysis that more general and well-established cryptographic schemes already have.

(18)

Chapter 3 Preliminaries

We start this chapter by presenting some fundamental concepts that are necessary to understand our contributions. Some concepts, definitions and basic results of Abstract and Linear Algebra are shown, as well as the cryptographic schemes used here. We also present the concept of secure multi-party computation and how it relates to this work. Finally some studies about computing over encrypted data are presented and discussed.

3.1 Linear Algebra

3.1.1 Vector spaces

We denote by _Rn _{the vector space of n-dimensional vectors with real entries, i.e., its} elements have the form (v1, v2, ..., vn) with each vi ∈ R. Real numbers are called scalars in this context.

Vector spaces are closed under vector addition and multiplication by scalars and those operations are defined by component-wise operations:

• Addition: Given vectors v = (v1, v2, ..., vn) and u = (u1, u2, ..., un), the addition is defined as u + v = (v1+ u1, v2+ u2, ..., vn+ un).

• Product: Given a vector v = (v1, v2, ..., vn) and a scalar α, the product by scalar is defined as α · v = (αv1, αv2, ..., αvn).

Notice that using those definitions, it is trivial to check that there exist an identity element for both operations, which are the zero vector (0, 0, ..., 0) for addition and the number 1 for product by scalar, because v = 0 = (v1 + 0, v2 + 0, ..., vn+ 0) = v and 1 · v = (1v1, 1v2, ..., 1vn) = v. Moreover, it follows directly from the commutativeness and associativeness of_{R that both operations are commutative and associative.}

Naturally, we can define division by scalar as a multiplication between a vector and the inverse of the scalar, thus, _αv = v · _α1.

Given vectors w1, w2, ..., wm, we say that they are linearly independent if the equation α1·w1+α2·w2+· · ·+αn·wm = 0, where α1, ..., αmare scalars, has α1 = α2 = ... = αm = 0 as its only solution. Otherwise, those vectors are linearly dependent. Notice that when

(19)

CHAPTER 3. PRELIMINARIES 19 D =     1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1     , B1 =     1 2 0 0 0 7 3 0 0 0 5 9 0 0 0 8     , B2 =       5 0 0 0 1 6 0 0 0 1 2 0 0 0 3 8 0 0 0 5       , T =       1 2 0 0 0 2 7 3 0 0 0 7 5 9 0 0 0 8 8 2 0 0 0 1 7      

Figure 3.1: Matrix D is diagonal, both B1 and B2 are bidiagonal, and T is tridiagonal. vectors are linearly dependent, we can write w` = −α_α1

` · w1− · · · −

αm

α` · wm if α` 6= 0, and

then we say that w` is a linear combination of the other m − 1 vectors.

For instance, the vector (3, 7) is a linear combination of (3, 0) and (0, 7), because (3, 7) = 1 · (3, 0) + 1 · (0, 7). Also, it is a linear combination of (1, 0) and (0, 1), since (3, 7) = 3 · (1, 0) + 7 · (0, 1). In fact, all vectors in R2 are linear combinations of (1, 0) and (0, 1), because if v = (v1, v2) ∈ R2, then v = (v1, 0) + (0, v2) = v1· (1, 0) + v2· (0, 1). When a set of vectors has that property, i.e. all vectors can be written as linear combinations of them, then we say that those vectors span the vector space. In addition to that, if those vectors are linearly independent, then, we say that they form a basis of the vector space. Thus, the set of vector {(1, 0), (0, 1)} is a basis of_R2 and it is called the standard basis.

More generally, for any n ∈ _N∗, the n-dimensional vectors (1, 0, ..., 0), (0, 1, ..., 0), (0, 0, ..., 1) form the canonical basis of Rn_{. Indeed any set of n linearly independent} vectors form a basis for_Rn_.

3.1.2 Matrices

We denote by _Rn×p _{the set of matrices with n rows and p columns whose all entries are} real values. The entry that lies in the i-th row and in the j-th column of a matrix A is denoted by Ai,j. The transpose of a matrix A ∈Rn×p is denoted by AT and is defined as the unique matrix AT _{∈ R}p×n such that Ai,j = ATj,i for all entries of A.

A diagonal matrix is any matrix A ∈ _Rn×p such that all of its entries Ai,j are zero if i 6= j. A bidiagonal matrix is a matrix whose all the elements outside the main diagonal and outside either the diagonal above or the diagonal bellow are zero. A tridiagonal matrix has all the elements outside the main diagonal and outside both the above and the bellow diagonal equal to zero. Figure 3.1 shows some examples.

Products between matrices A ∈ _Rn×p _{and B ∈} _Rp×m _{generates a matrix C ∈} _Rn×m whose each entry is

Ci,j = p X k=1

Ai,p· Bp,j

Hence if we look at vectors of _Rp _{as elements of} _Rp×1_{, which means, column vectors,} we can think of products between matrix and vectors and even products between vectors and vectors as products of matrices. In particular, the inner product between two vectors u and v of Rp×1 is defined as the product uT _{· v, so, a product between an element of R}1×p and one of Rp×1_{, which generates an element of} _R1×1_{, or, in other words, a scalar. And}

(20)

CHAPTER 3. PRELIMINARIES 20

the outer product between those vectors is defined as u · vT_{, which results in an element} of _Rp×p_.

Two vectors u and v are said to be orthogonal if uT · v = 0. If, in addition to that, they are normalized, then we say that u and v are orthonormal vectors. A matrix A is orthogonal if any two different columns Ai and Aj are orthonormal. Notice that if A is orthogonal, then the products AT_{A and AA}T _{are equal and result in the identity matrix.} The (Euclidean) norm of a vector u is ||u|| =√uT_{u. The (Euclidean) distance between} two vectors u and v is ||u − v||. We say that a vector is normalized if its norm is equal to 1. Any vector can be normalized if divided by its own norm.

If a square matrix X is equal to its own transpose, then we say that X is a symmetric matrix. For any symmetric matrix (of same dimensions) X and Y , and any n ∈_N∗, the addition X + Y , the subtraction X − Y , and the exponentiation Xn _{are also symmetric} matrices.

It is important to know the properties that symmetric matrices have because, as discussed in section 5.1, when trying to compute the Principal Component Analysis, we have to deal with the covariance matrix of a data matrix, and covariance matrices are symmetric. It is easy to see that because for any data matrix X ∈ Rn×p_{, the covariance} matrix C is equal to X_n−1TX, where X is the matrix X with the mean of each column set to zero, i.e., the mean of each column is calculated and subtracted from the column to obtainX.

Moreover, all covariance matrices C ∈ _Rp×p _{are positive semi-definite, that is, in} addition to be symmetric, they satisfy the inequality uT_{Cu ≥ 0 for any vector u ∈ R}p_. To check that fact, let y = Xu. Note that y is a p-dimensional vector. Thus

uTCu = uTX T X n − 1u = 1 n − 1(u T_XT_{)(Xu) ==} 1 n − 1y T_{y =} 1 n − 1 p X i=0 y2_i ≥ 0.

3.1.3 Eigenvalues and eigenvectors

For any square matrix X, we say that a scalar λ ∈_{R is a eigenvalue of X if there exists a} nonzero vector v such that Xv = λv. Any such v is called an eigenvector of X associated with λ. If v is an eigenvector associated with an eigenvalue λ, then we call (λ, v) an eigenpair.

The eigenvalue of largest absolute value is called first dominant eigenvalue, the one with the second largest absolute value is called second dominant eigenvalue, and so on. The first dominant eigenvalue is called simply dominant eigenvalue. An eigenvector v associated with the i-th dominant eigenvalue λ is called i-th dominant eigenvector and the pair (λ, v) is called the i-th dominant eigenpair.

For instance, consider the following matrix:

X =   0 4 0 5 −1 8 0 1 6  

(21)

The eigenvalues of X are −5.4, 7.4, and 3. Therefore, ordering by the absolute value we can see that the dominant eigenvalue is 7.4, the second dominant one is −5.4 and the third dominant one is 3. Any eigenvector associated with 7.4 is a dominant eigenvector and any vector associated with −5.4 is a second dominant eigenvector.

We say that a procedure shifts the eigenvectors of a matrix X if it returns any matrix Y such that the dominant eigenvectors of Y are the second dominant eigenvectors of X, the second dominant eigenvectors of Y are the third dominant eigenvectors of X, etc.

As the next theorem shows, the eigenvalues of any symmetric matrix are real. It is an important property because having complex eigenvalues could complicate the resolution of Principal Component Analysis.

Theorem 3.1.1. All the eigenvalues of any symmetric matrix are real.

Proof. Letu denote the complex conjugate of u. If u is a vector or a matrix, extend it to the complex conjungate of each entry of u. Let C be a real symmetric matrix and (λ, v) one of its eigenpairs.

Then, we have the following equalities:

Cv = λv By definition of eigenpair

Cv = λv Complex conjugate of each side

Cv = λv Since C is real (3.1)

vTCv = vTλv Multiplying by vT on the left

vTCv = λvTv Since λ is an scalar

On the other hand, we also have the following equalities:

Cv = λv By definition of eigenpair

(Cv)T = (λv)T Transposing both sides

vTCT = (λv)T Property of transpose of product (3.2)

vTC = (λv)T Since C is symmetric

vTC = λvT Since λ is scalar

vTCv = λvTv Multiplying by v on the right

Now, from this last equality in Equations 3.1 and the last one in Equations 3.2, it follows that:

λvTv = λvTv. (3.3)

Eigenvectors are not zero by definition, then, v 6= 0. Therefore, vTv = P vivi = P |vi| > 0. Then, we can divide both sides of Equation 3.3 by vTv and obtain λ = λ, which implies that λ is real.

In what follows we present some results that were used in this work.

Theorem 3.1.2 (Multiplying by scalar does not change the eigenvectors). Let A and B be two real matrices such that B = αA for some α ∈_{R. Then, (λ, v) is an eigenpair of} A if, and only if, (αλ, v) is an eigenpair of B.

(22)

Proof. Let α be some real scalar. Then

Av = λv ⇔ α(Av) = α(λv) ⇔ Bv = (αλ)v.

Corollary 3.1.3 (Multiplying by scalar does not change the order of the dominant eigen-vectors). Let α be a positive real scalar and A, B ∈ _Rn×n _{be two real matrices such that} B = αA. Then, if (λi, vi) is an i-th dominant eigenpair of A, then (αλ, vi) is an i-th dominant eigenpair of B.

Proof. Let (λi, vi) be an i-th dominant eigenpair of A, for i ∈ {1, 2, ..., `} for some ` ∈ N∗. Then

|λ1| > |λ2| > ... > |λ`|. Therefore,

|αλ1| > |αλ2| > ... > |αλ`| .

From Theorem 3.1.2, we know that (αλi, vi) are the eigenpairs of B, then those pairs appear in the same order, i. e., (αλi, vi) is an i-th eigenpair of B.

Theorem 3.1.4 (Spectral Theorem for Symmetric Matrices). Suppose A ∈_Rn×n _is sym-metric. Then, it can be written as A = U DUT_{, where U is a orthogonal matrix in which} each column is a normalized eigenvector and D is a diagonal matrix with the eigenvalues on the principal diagonal in an order corresponding to the columns of U (it means that for i ∈ {1, 2, .., n}, the pair (Dii, Ui) is an eigenpair, where Ui is the i-th column of U ). Proof. Chapter 5.4 of [38].

Corollary 3.1.5 (Symmetric matrix as a sum). Let A ∈Rn×n _{be symmetric and (λ} 1, v1), (λ2, v2), ..., (λn, vn), eigenpairs of A, with ||vi|| = 1 for i ∈ {1, 2, .., n}. Then, A may be written as follows: A = n X i=1 λivivTi .

Proof. Let A ∈_Rn×n _{be symmetric. Then, by Theorem 3.1.4, we have:} U = U1 U2 · · · Un and L =        λ1 0 0 · · · 0 0 λ2 0 · · · 0 0 0 λ3 · · · 0 .. . ... . .. ... 0 0 0 · · · λn       

(23)

such that for all i ∈ {1, 2, .., n}, ui is the normalized eigenvector associated with λi and A can be written as A = U LUT_. However, LUT = λ1U1T λ2U2T · · · λnUnT. . And then, A = U LUT = n X i=1 λiUiUiT .

3.2 Abstract Algebra

In this section we define the notions of groups and rings and present some basic properties about them.

3.2.1 Groups

We say that a nonempty set G and an operation ? on it form a group if that operation obeys the following laws [34]:

• Closure: ∀a, b ∈ G, a ? b ∈ G.

• Associativity: ∀a, b, c ∈ G, (a ? b) ? c = a ? (b ? c).

• Identity: ∃e ∈ G : ∀a ∈ G, a ? e = e ? a = a. Such e is called an identity in G. • Inverse: ∀a ∈ G, ∃a−1 _{∈ G : a ? a}−1 _{= a}−1 _{? a = e. Such a}−1 _{is called the inverse of}

a in G.

It is common to write that (G, ?) is a group or simply that G is a group, letting the operation implicit. The identity element e is unique and, for any element a, so is the inverse a−1.

The set _{Z with the conventional addition operation + is a group where 0 is the unity} element and for any integer a, the number −a is the inverse element. The set of rational numbers without the zero, _Q∗, equipped with the usual multiplication is also a group, and 1 is its identity element. Notice that the operations defined in those two groups are commutative on them. In such cases, we say that the group is commutative, or Abelian. A classic example of a non-Abelian group is the general linear group over_{R, denoted by} GLn(R), which is the set of n × n invertible matrices with real entries equipped with the usual matrix multiplication.

3.2.2 Rings

A set R equipped with two operations, denotated by addition and multiplication, is said to be a ring if the following laws are satisfied [34]:

(24)

• Group structure: R is an Abelian group under addition. • Closure: ∀a, b ∈ R, a · b ∈ R.

• Associativity: ∀a, b, c ∈ R, (a · b) · c = a · (b · c).

• Distributivity: ∀a, b, c ∈ R, (a + b) · c = a · c + b · c and c · (a + b) = c · a + c · b. An unit ring is a ring that has a multiplicative identity, i.e., an element em such that em· a = a · em = a for all elements a. If multiplication in the ring is commutative, then the ring is called a commutative ring.

The set of integers with the usual addition and multiplication is an example of com-mutative unit ring. The set of n × n matrices with real entries and natural operations is an example of a non-commutative unit ring. The set of complex numbers is also a ring and the set of polynomials with complex coefficients as well. In fact, given any ring R, the set of polynomials with coefficients in R, denoted by R[x] is always a ring. In particular, Z[x] is the ring of polynomials with integer coefficients.

A subring of a ring R is a subset S ⊂ R that is yet a ring under the same operations of R. For instance, _{Z is a subring of Q. An ideal is a subring that is also closed to the} multiplication by elements of the ring, i.e., given a ring R, we say that a subset I of R is an ideal in R if the following conditions are true:

• I is a subring of R

• ∀(r, i) ∈ R × I, ri ∈ I and ir ∈ I.

For instance, consider the ring_{Z, then, the set of even integers, 2Z, is an ideal because} 2Z is a ring and multiplying any integer by an even integer results in another even integer. More generally, for any integer n ≥ 1, the set n_{Z = {m ∈ Z : n divides m} is an ideal in} Z because it is a subring of Z and for any integer r, if n divides m, then n divides rm and mr, so, rm ∈ n_{Z and mr ∈ nZ.}

3.2.3 Principal ideals

For any ring R, the set of multiples of an element r ∈ R, denoted by hri, is an ideal and it is called a principal ideal generated by r. Note that n_{Z is a principal ideal generated} by n.

What is notable about ideals is the fact that they define an equivalence relation that provides a general construction to modular arithmetic in the ring, no matter what the elements of the ring are. That equivalence relation is the following: given a ring R and an ideal I of R, we say that two elements a, b ∈ R are equivalent if, and only if, a − b ∈ I. In such case, we say that a and b are congruent modulo I and we write a ≡ b (mod I).

Notice that when I is an principal ideal generated by an element n, then the condition a − b ∈ I means that a − b is a multiple of n, i.e., there exist α in the ring such that a − b = αn, and that is possible if, and only if, a and b have the same remainder when divided by n, therefore, the number of possible remainders when dividing elements of the ring by n tells us how many equivalence classes we have.

(25)

Table 3.1: Table of additions of the ring _Z3.

+ 0 1 2

0 0 1 2

1 1 2 0

2 2 0 1

Table 3.2: Table of products of the ring _Z3.

× ₀ ₁ ₂

0 0 0 0

1 0 1 2

2 0 2 1

As an example, take the ring _{Z and the ideal 3Z = {0, ±3, ±6, ±9, ...}, which is} generated by 3. The integers 10 and 4 are equivalent because 10 − 4 = 6 ∈ 3_{Z. Also,} the remainder of the division of 10 by 3 is 1, which is also the case for 4. The possible remainders when diving by 3 are 0, 1, and 2, so, the number of equivalence classes are 3 and those 3 values can represent them, it means that all the integers are congruent to one of those three values modulo 3Z.

By using ideals we can construct new rings from existing ones, using the concept of quotient ring: For any ring R and ideal I on R, we define the quotient ring of R by I, denoted by R/I, as the ring of equivalence classes regarding I and whose operations are naturally extended from the operations of R by just applying them over the equivalence classes as they were elements of R and then reducing then modulo I. For the last example, considering the ringZ and the ideal 3Z, the quotient ring is Z/3Z = {0, 1, 2} and the table of operations are shown in Tables 3.1 and 3.2. In general, we have_{Z/nZ = {0, 1, 2, ..., n −} 1}, which is usually denoted by Zn.

Consider now the ring of polynomials with integer coefficients,_{Z[x] and the quadratic} polynomial p(x) = x2− 1. The principal ideal generated by p(x) is hp(x)i = {q(x) · p(x) : q(x) ∈ Z[x]}. Both the polynomials f (x) = x3 + 3x + 2 and g(x) = x4− 3x3_{+ 7x + 1} have remainder 4x + 2 when divided by p(x), therefore, they are congruent modulo p(x) and so, belong to the same equivalence class. In this example, the possible remainders when dividing by p(x) are all the polynomials with integer coefficients and degree up to 2, which means that _{Z[x]/hp(x)i = {ax + b : a, b ∈ Z}, with operations performed modulo} p(x). For instance, the product (x − 1) · (x + 1) results in x2− 1, which is p(x) itself, so, it is reduced to 0 modulo p(x), and then we have (x − 1) · (x + 1) = 0 ∈_Z[x]/hp(x)i.

We can apply the same idea even to rings that are already quotient rings. For instance, Z2 is the quotient ring whose elements are 0 and 1, therefore, Z2[x] is the ring of polyno-mials with binary coefficients. Now, consider x8_{+ x}4_{+ x}3_{+ x + 1: it is an element of Z}

2[x], thus, generates a principal ideal, and then, the quotient ring_Z2[x]/hx8+ x4+ x3+ x + 1i is well defined. Its elements are binary polynomials with degree up to 7. And in fact, this is a well-known ring for the cryptography community, because it is the ring used to

(26)

generate the S-boxes of AES [12], which is perhaps the most popular block cipher in use today.

3.2.4 Maps between rings

Ring homomorphisms are functions from a ring to another that preserve the structure of the first ring. This class of functions is useful because homomorphisms allow to map ring elements to other rings, where the representation may be more convenient and other properties are known to be true. Formally, we say that a function ψ from a ring (R1, +1, ·1) to a ring (R2, +2, ·2) is a homomorphism between them if the following is true for any a and b in R1:

• ψ(a +1b) = ψ(a) +2ψ(b) • ψ(a ·1b) = ψ(a) ·2 ψ(b)

A bijective homomorphism is called an isomorphism and finding an isomorphism be-tween two rings means that they are actually the same ring, but represented in different ways. Isomorphisms are very important from a computing point of view because working with a specific representation may be more efficient than working with other ones.

For instance, let R = a b −b a

: a, b ∈ R

and consider the usual addition and multiplication of matrices. The null matrix is the additive identity and the identity matrix is the multiplicative identity. Also, every element of R has an additive and an multiplicative inverse, because the two following equalities are true:

• a b −b a +−a −b b −a =0 0 0 0 • a b −b a · a a2_+b2 − b a2_+b2 b a2_+b2 a a2_+b2 ! =1 0 0 1

Therefore, R is a ring under those operations. But it turns out that R is isomorphic to the set of complex numbers via the isomorphism

ψ : R → C a b

−b a

7→ a + ib.

Thus, anyone that wants to work with R can work with _{C and the converse is also} true.

3.2.5 Chinese Remainder Theorem as a Ring Isomorphism

The Chinese remainder theorem (CRT) states that given the remainders of the division of an integer x by several pairwise coprime integers, it is possible to uniquely determine

(27)

the remainder of the division of x by the product of those integers, i.e., given integers m1, m2, ..., mn such that gcd(mi, mj) = 1 if i 6= j, then the system:

         x ≡ a1 (mod m1) x ≡ a2 (mod m2) .. . x ≡ an (mod mn)

has a unique solution x modulo M = m1· m2· · · mn, and this solution is by construction:

x = n X i=1 aizi M mi ,

where zi is an integer such that ziM_{m i} ≡ 1 (mod mi).

Notice that the CRT may be viewed as an injective function from_Zm1×Zm2×· · ·×Zmn

to_ZM, because it relates any tuple (a1, a2, ..., an) to a unique x ∈ ZM. Also, it is easy to see that CRT-1(x) = (x mod m1, x mod m2, ..., x mod mn) is the inverse function of CRT, which means that CRT is actually a bijection fromZm1 × Zm2 × · · · × Zmn toZM.

Moreover, we can see that CRT is actually a ring isomorphism. To understand that, it is easier observe that CRT-1 _{is an isomorphism, because}

CRT-1(x) + CRT-1(y) =(x mod m1, ..., x mod mn) + (y mod m1, ..., y mod mn) =(x + y mod m1, ..., x + y mod mn)

= CRT-1(x + y) and

CRT-1(x) · CRT-1(y) =(x mod m1, ..., x mod mn) · (y mod m1, ..., y mod mn) =(x · y mod m1, ..., x · y mod mn)

= CRT-1(x · y).

Any function is an isomorphism if, and only if, its inverse function is also an isomor-phism.

Having an isomorphism between _Zm1 × · · · × Zmn and ZM means we can work

in-dependently and in parallel with n smaller integers instead of working with a single big integer, because elements of ZM are bigger than elements of Zmi. It can be handy when

we need to perform operations over big integers, which occurs very often in public key cryptography scenarios.

Since any isomorphism from a ring R yields an isomorphism from its polynomial ring R[x] when applied coefficient-wise, we also have an isomorphism between ZM[x] and (Zm1 × · · · × Zmn)[x] given by:

CRT-1_(a

nxn+ an−1xn−1+ ... + a1x + a0) = CRT-1(an)xn+ ... + CRT-1(a1)x + CRT-1(a0) Moreover, the Chinese remainder theorem can be generalized to polynomial rings

(28)

using the concept of coprime polynomials and modular operations. The polynomial version of CRT states that given moduli polynomials m1(x), m2(x), ..., mn(x) such that gcd(mi(x), mj(x)) = 1 if i 6= j, then the following linear system has a unique solution p(x) modulo M (x) = m1(x) · m2(x) · · · mn(x).          p(x) ≡ a1(x) (mod m1(x)) p(x) ≡ a2(x) (mod m2(x)) .. . p(x) ≡ an(x) (mod mn(x)) And this solution is by contruction

p(x) = n X i=1 ai(x)zi(x) M (x) mi(x) ,

where zi(x) is a polynomial such that zi(x)_mM (x)

i(x) ≡ 1 (mod mi(x)).

This variant of CRT give us an isomorphism between _{Z[x]/hM(x)i and the product} ring _Z[x]/hm1(x)i × ... × Z[x]/hmn(x)i.

3.2.6 Multiplicative depth

Before the idea of homomorphic encryption is discussed, it is important to have a clear understanding about a simple concept: the multiplicative depth. It is a quantity that tell us how many products we have to perform in sequence in order to evaluate an algorithm. It is easily visualized when we represent the algorithm as a binary circuit with each multiplication of the same sequence of multiplications being represented in a subsequent level of the circuit.

For instance, if we want to multiply four numbers a, b, c, and d, we can first multiply a by b, then, the result of this operation by c, and then multiply that result by d, ending up with three multiplications in total and a multiplicative depth equals to three, as shown in figure 3.2. But instead of doing so, we can first multiply a by b and c by d, then multiply the results of those two products, as shown in figure 3.3, which still represents an algorithm with three multiplications, but with multiplicative depth equals to two.

3.3 Homomorphic Encryption

Generally, cryptographic schemes are designed to guarantee privacy, making it possible to send messages securely via insecure channels.

In 1978, Rivest, Adleman e Dertouzos [35] proposed the concept of privacy homomor-phisms, consisting of cryptographic schemes that in addition of ensuring privacy would be able to evaluate some functions over encrypted data. That concept is now known as homomorphic encryption (HE).

(29)

CHAPTER 3. PRELIMINARIES 29 a b c d a · b (a · b) · c ((a · b) · c) · d

Figure 3.2: Circuit with multiplicative depth equal to 3

a b a · b c d c · d (a · b) · (c · d)

(30)

3.3.1 Classification

A criptographic scheme is homomorphic for an operation ? if it is equivalent to perform this operation over plaintexts or over ciphertexts. For instance, considering the addition operation, the sum of two ciphertexts generates a third ciphertext that has to be decrypted to the sum of the two correspondent plaintexts.

Consider the (unpadded version of) RSA: the private key is a triple (p, q, d) and the public key is a pair (n, e), where p and q are primes, n = pq, e is an integer coprime with (p − 1)(q − 1), and d is an integer such that d · e ≡ 1 (mod (p − 1)(q − 1)) [26].

In order to encrypt a message m ∈ _Z∗_n, we calculate the ciphertext c in the following way:

c = me (mod n).

To decrypt c and recover m, we just have to perform another exponentiation: m = cd (mod n).

Hence, RSA is multiplicatively homomorphic because taking two ciphertexts c1 = me1 (mod n) and c2 = me2 (mod n), the product of those ciphertexts is c1 · c2 = (me1)(me2) = (m1m2)e (mod n) which is exactly the encryption of m1· m2, and consequently decrypts to that product.

When a cryptographic scheme is additively and multiplicatively homomorphic and it is possible to perform any sequence of those operations over ciphertexts, it is said to be a Fully Homomorphic Encryption (FHE) scheme.

If there is some limit on the number of operations that can be performed in sequence, than, the scheme is called Somewhat Homomorphic Encryption (SHE). If this limit is parameterizable, it means, it is possible to choose the maximum length of such sequences of operations by setting some parameter L (which usually affects the other parameters), then the scheme is called Leveled Homomorphic Encryption. In general, the number of homomorphic multiplications is very limited, and that parameter L represents the maximum multiplicative depth accepted by the scheme. If the multiplicative depth of the function to be evaluated is greater than L, then the results will probably decrypt to wrong values.

For instance, if a homomorphic scheme supports only two products in sequence, then, given ciphertexts c1, c2, c3, and c4, we can evaluate c1· c2, or c1· c2+ c3· c2, or c1· c2· c4, but not c1· c2· c3· c4, because the multiplicative depth of that last procedure is equal to three.

3.3.2 Security considerations

In HE schemes something usually considered as an undesirable property is intrinsically present: the malleability, i.e., it is possible to manipulate ciphertexts with predictable results in the plaintexts.

The strongest notion of security usually applied to public key encryption schemes is known as adaptive chosen-ciphertext indistinguishability security (IND-CCA2 security).

(31)

It is defined as a game involving two parties: the challenger, which represents the honest party, and the adversary, which represents an agent trying to break the security of the scheme in polynomial time. The challenger generates the keys and provides encryption and decryption oracles to the adversary, who can interact with those oracles to encrypt or decrypt arbitrary messages. The adversary’s goal is then to distinguish if a given ciphertext is an encryption of one message m0 or of a second message m1. Such security definition models a scenario where an attacker has access to a cryptosystem infrastructure.

Formally, this game is defined as follows:

1. Using some security parameter λ, the challenger generates the secret and the public keys, sk and pk, and publishes pk.

2. The adversary may perform a polynomial number of polynomial time (in the length of 1λ) operations, including submitting any message m to be encrypted by the encryption oracle and any ciphertext c to decrypted by the decryption oracle. 3. The adversary submits two different messages m0 and m1 to the challenger.

4. The challenger samples a bit b uniformly from {0, 1} and encrypts mb: c = Enc(sk, mb). Then c is sent to the adversary.

5. The adversary may perform more operations, including new queries to the oracles, but without sending c.

6. Then, the adversary guesses a value b0 to the bit b.

A scheme is IND-CCA2 secure if for any probabilistic polynomial time adversary, the probability of guessing correctly the value of b, that is P r[b0 = b], is negligible.

Since IND-CCA2 security implies in non-malleability, the contrapositive is also true: any malleable scheme is not IND-CCA2 secure, in particular, no homomorphic encryption scheme is IND-CCA2 secure [17]. Considering that, the stronger security definition a HE scheme may satisfy is the notion of non-adaptive chosen-ciphertext indistinguishability security (IND-CCA1 security), which is defined just like IND-CCA2, but without step 5, which means the adversary cannot interact with the oracles after submitting the mes-sages and receiving the challenge ciphertext [30]. Even so, several homomorphic encryp-tion schemes only satisfy indistinguishability under chosen-plaintext attack (IND-CPA security) [13], which considers a scheme to be secure if no probabilistic adversary with access to an encryption oracle can decide in polynomial time if a given ciphertext is an encryption of a plaintext m0 or m1 with a non-negligible probability.

3.3.3 The YASHE cryptosystem

In this work, the more practical variant of the leveled homomorphic encryption scheme known as YASHE [7, 29, 14] was used. This scheme is based on a modified version of NTRU [36], therefore, it works over rings. Its message space is the ring Rt= Zt[x]/hΦd(x)i, where Φd(x) is the d-th cyclotomic polynomial. Any element of this ring may be viewed

(32)

as a polynomial whose degree is less than n = ϕ(d), the degree of Φd(x), and whose coeffi-cients are integers in_Zt= −₂t,₂t∩Z. Its ciphertext space is the ring Rq = Zq[x]/hΦd(x)i, where q is a prime number and _Zq = −q₂,q₂ ∩ Z.

This scheme uses two discrete bounded probability distributions on Rq, one used to sample small polynomials in the key generation procedure, denoted by χkey, and the other used to sample noise polynomials in the encryption step, denoted by χerr. In practice, χerr is often chose to be a truncated discrete Gaussian distribution and χkeyto be a distribution on polynomials with small coefficients (for instances, coefficients in {−1, 0, 1}).

In addition to the secret and the public keys, the scheme also has an evaluation key evk, that is used by the cloud to perform homomorphic multiplications. This key is defined in terms of the powersOf function, which receives a polynomial p and returns a vector of polynomials (w0_{p, w}1_{p, ..., w}`w,q−1_{p) ∈ R}`w,q

q , where each position has a product of p by a power of the integer w and `w,q= blogw(q) + 1c.

There is also an auxiliary function named keySwitch, can transform a ciphertext en-crypted under a key sk1 in another encrypted under a key sk2 using an evaluation key generated from sk1. This function is used in the homomorphic multiplication because the simple product between two ciphertexts c1 and c2 generates a third ciphertext encrypted under the product of their secret key, then, keySwitch is applied to transform this third ciphertext in a ciphertext encrypted under the original secret key.

The YASHE cryptosystem can be described as follows:

HE. paramsGen(λ): Given the security parameter λ, choose integers w, t, and q such that w > 1, 1 < t < q and q is prime. Chose a positive integer d and calculate Φd(x). Fix the two distributions χkey and χerr. Output (d, q, t, χkey, χerr, w).

HE. keyGen(d, q, t, χkey, χerr, w): Sample f0 from χkey and let f = tf0+ 1 ∈ Rq. If f is not invertible in Rq, pick a new f0 and recalculate f . Compute the inverse f−1 of f in Rq and set h = tgf−1 ∈ Rq. Sample the two vectors e, s from χerr`w,q and compute γ = powersOf(f ) + e + hs ∈ R`w,q

q . Output (pk, sk, evk) = (h, f, γ).

HE. enc(pk, m): Take a plaintext m ∈ Rt, sample s and e from χerr and output the ciphertext c =q_t m + e + pk s ∈ Rq.

HE. dec(sk, c): Compute z = sk ·c ∈ Rq and output m = j

t q · z

m ∈ Rt. HE. add(c1, c2): Output cadd = c1+ c2 ∈ Rq.

HE. prod(c1, c2): Compute c3 = _qtc1c2 ∈ Rq and output cmult = keySwitch(c3, evk).

This scheme has an implicit parameter L that specifies the number of multiplications that can be performed in sequence, i.e., the maximum multiplicative depth that the circuits to be evaluated may have. This parameters is a function of the other parameters. Depending on the choice of the parameters, the ring Rt may be isomorphic to a direct products of smaller rings and this fact is usually exploited to reduce a big plaintext to several small ones and work with them independently or to pack several plaintexts into a single (and bigger) one and work with it in a way that it is equivalent to working

(33)

with all the packed plaintexts in parallel. This is done using the Chinese remainder theorem [44, 29, 7] in two ways:

• Over the coefficients: when t is a product of coprime numbers t1, t2, ..., t`, then the CRT is an isomorphism between Rt1 × Rt2 × ... × Rt` and Rt.

• Over the cyclotomic polynomial: even if the cyclotomic polynomial Φd(x) is irre-ducible over _{Q, it may be composite over Z}t depending on the choice of t. In this case, Φd(x) may be written as a product of polynomials f1(x)×f2(x)×...×f`(x) and the (polynomial) CRT is an isomorphism between _Zt[x]/hf1(x)i × ... × Zt[x]/hf`(x)i and Rt.

3.4 Order-preserving Encryption

“Order-preserving symmetric encryption (OPE) is a deterministic encryption scheme which encryption function preserves numerical ordering of the plaintexts” [5]. In other words, given two plaintexts m1 and m2, and their corresponding ciphertexts c1 and c2 encrypted with an OPE scheme, the following implication is true:

m1 ≤ m2 ⇒ c1 ≤ c2.

Note that OPE schemes are deterministic by definition, since given plaintexts m1 and m2 and their corresponding ciphertexts c1 and c2, if m1 = m2, then m1 ≤ m2 and m2 ≤ m1, which implies in c1 ≤ c2 and c2 ≤ c1, therefore, c1 = c2. Additionally, any public-key scheme whose encryption preserves the order is insecure, because anyone holding the public key would be able to encrypt values and use the order to discover the plaintext associated with a given ciphertext c.

Therefore, since this kind of scheme leaks the order-relation among the plaintexts and is deterministic, it does not fulfill the requirements of IND-CPA security notion (indistin-guishability against chosen-plaintext attack) [6]. Thus, other security notions iare applied to OPE, namely pseudorandom order-preserving function against chosen-ciphertext at-tack (POPF-CCA), which is an adaption of the security notion of pseudorandom function (that defines security in terms of indistinguishability of truly random functions). An OPE scheme is POPF-secure if its encryption algorithm is indistinguishable to a random order-preserving function.

Let M and N be two positive integers such that N > M , [M ] = {1, 2, ..., M }, [N ] = {1, 2, ..., N }, and OPF[M],[N] the set of order-preserving functions from [M ] to [N ]. The scheme used in this work, the one presented in [5], achieves POPF security by simulating a function randomly sampled from OPF[M],[N] to be its encryption algorithm. This is done exploiting the fact that the number of functions in OPF[M],[N] equals _MN and so, fixing a y ∈ [N ] and sampling a random f from OPF[M],[N], the probability of having f (x) ≤ y < f (x + 1) for any x ∈ [M ] is P(f (x) ≤ y < f (x + 1)) = y x · N −y M −x N M .

(34)

But that is exactly the probability of having x successes in a y draws following a hypergeometric distribution (HGD) with parameters M and N , then, sampling a random function from OPF[M],[N] is equivalent to running an experiment following the HGD.

A keyed pseudorandom function p is used to seed the HGD sampler. The scheme is parametrized by two positive integers M and N , with N > M . They define the plaintext message space, which is the set [M ], and the the ciphertext space, [N ]. The scheme is defined as follows:

OPE. keyGen(M, N, λ): This is a non deterministic algorithm that receives the security parameter λ and outputs the secret key K, which is a key to the pseudorandom function p.

OPE. enc(K, m): Encrypt a plaintext m ∈ [M ] using K and following algorithm 1 passing [M ] as the set D and [N ] as the set R

OPE. dec(K, c): Decrypt a ciphertext c ∈ [N ] using K and following algorithm 2 passing [M ] as the set D and [N ] as the set R.

Algorithm 1 OPE Encryption

1: Input: K, sets of consecutive integers D and R, m ∈ D 2: 3: M = |D| 4: N = |R| 5: d = min(D) − 1 6: r = min(R) − 1 7: y = r +N₂ 8: if M = 1 then

9: cc = p(K, D||R||1||m) . Concatenate D, R, 1, and y represented as strings 10: Sample c from R over the coins of cc

11: else

12: cc = p(K, D||R||0||y) . Concatenate D, R, 0, and y represented as strings 13: x = HGD(D, R, y, cc) 14: if m ≤ x then 15: D = {d + 1, d + 2, ..., x} 16: R = {r + 1, r + 2, ..., y} 17: else 18: D = {x + 1, x + 2, ..., d + M } 19: R = {y + 1, y + 2, ..., r + N } 20: end if 21: c = OPE Encryption (K, D, R, m) 22: end if return c

3.5 Centralized versus Multi-party Computation

In the cloud computing model, clients are supposed to send their data to the cloud to min-imize the effort necessary to manage and process the data. Clouds are usually supposed

(35)

Algorithm 2 OPE Decryption

1: Input: K, sets of consecutive integers D and R, m ∈ D 2: 3: M = |D| 4: N = |R| 5: d = min(D) − 1 6: r = min(R) − 1 7: y = r +N 2 8: if M = 1 then 9: m = min(D)

10: cc = p(K, D||R||1||m) . Represent D, R, 1, and y as strings and concatenate them 11: Sample w from R over the coins of cc

12: if w = c then return m 13: elsereturn ERROR 14: end if

15: else

16: cc = p(K, D||R||0||y) . Represent D, R, 0, and y as strings and concatenate them 17: x = HGD(D, R, y, cc) 18: if c ≤ y then 19: D = {d + 1, d + 2, ..., x} 20: R = {r + 1, r + 2, ..., y} 21: else 22: D = {x + 1, x + 2, ..., d + M } 23: R = {y + 1, y + 2, ..., r + N } 24: end if 25: c = OPE Decryption (K, D, R, c) 26: end if return c

(36)

to have powerful capabilities in terms of storage, memory and processing while clients are are supposed to be weaker in computational terms.

When some task is to be done over data stored in the cloud, clients do not have to process the data and, ideally, the cloud has to perform it alone, without interacting with clients, it means that clients have to send the data to be stored in the cloud and then just request results of specific functions which are computed by the cloud as a black-box.

This scenario is the centralized one. When talking about secure outsourced compu-tation, in addition to following the cloud computing model, the cloud servers must not learn the data to be computed nor the responses sent to the clients.

In the other hand, in multi-party computation several parties process some data in an interactive way. Generally, each party has some portion of the data and they interact performing some calculations locally and then sending the results to other parties. The secure multi-party model assumes that ` participants, each one possessing some data data di, want to interact to compute a public function F (d1, d2, ..., d`) without revealing their data to the other participants, i.e., after the end of protocol, each participant i must not have learnt any dj for i 6= j.

In this work we focus on the centralized scenario, developing algorithms that can be evaluated over encrypted data by the cloud without requiring the clients to help it. But even so, in the section 3.6 we also discuss some articles that deal with multi-party com-putation because they also face the problem of performing machine learning algorithms in a privacy-preserving way.

3.6 Related Works

Several authors already studied the application of homomorphic encryption to compute machine learning tasks in the cloud with some privacy guarantees. The authors of [28] dis-cussed the practical limitations of FHE schemes and showed that SHE is an alternative to the problem of computing over encrypted data. As a proof-of-concept, they implemented simple functions homomorphically: a simple average of n integers, a standard deviation, and a logistical regression.

Following that work, in the paper [21], LHE was used to evaluate the binary clas-sifiers Linear Means ans Fisher’s Linear Discriminant over encrypted data. That was done by encoding real number using a scale technique and by eliminating the divisions present in the original algorithms. The authors also briefly mentioned that PCA could be solved homomorphically by gradient descent, but did not provide an actual construction. Bost et al. [8] studied the classification problem using modified versions of more complex classifiers, such as decision trees and simple Bayesian inference for medical applications. As a result, the authors constructed privacy-preserving classifiers, but incurring in high communication cost due to consecutive communication rounds. Other works dealt with the problem of large-scale statistical analysis [40] for linear regression and other useful metrics for machine learning tasks; or the design of interactive protocols for clustering over encrypted data [23], such as applying the classical k-means algorithm [32] for two-party execution.

Machine learning over encrypted data = Aprendizagem de máquina sobre dados cifrados

Hilder Vitor Lima Pereira

Machine Learning over Encrypted Data

Aprendizagem de máquina sobre dados cifrados

CAMPINAS

2016

Hilder Vitor Lima Pereira

Machine Learning over Encrypted Data

Aprendizagem de máquina sobre dados cifrados

CAMPINAS

2016

Hilder Vitor Lima Pereira

Machine Learning over Encrypted Data

Aprendizagem de máquina sobre dados cifrados

Acknowledgements

Resumo

Abstract

List of Figures

List of Tables

Contents

Chapter 1

Introduction

1.1

Objectives

1.2

Contributions

1.3

Document Organization

Chapter 2

Related Works

Chapter 3

Preliminaries

3.1

Linear Algebra

3.1.1

Vector spaces

3.1.2

Matrices

3.1.3

Eigenvalues and eigenvectors

3.2

Abstract Algebra

3.2.1

Groups

3.2.2

Rings

3.2.3

Principal ideals

3.2.4

Maps between rings

3.2.5

Chinese Remainder Theorem as a Ring Isomorphism

3.2.6

Multiplicative depth

3.3

Homomorphic Encryption

3.3.1

Classification

3.3.2

Security considerations

3.3.3

The YASHE cryptosystem

3.4

Order-preserving Encryption

3.5

Centralized versus Multi-party Computation

3.6

Related Works