Online facility location and Steiner problems = Problemas online de localização de instalações e de Steiner

(1)

M´ario C´esar San Felice

“Online Facility Location and Steiner Problems”

“Problemas Online de Localiza¸c˜ao de Instala¸c˜oes e

de Steiner”

CAMPINAS

2015

(2)

(3)

(4)

Ficha catalográfica

Universidade Estadual de Campinas

Biblioteca do Instituto de Matemática, Estatística e Computação Científica Ana Regina Machado - CRB 8/5467

San Felice, Mário César,

Sa57o SanOnline facility location and Steiner problems / Mário César San Felice. – Campinas, SP : [s.n.], 2015.

SanOrientador: Orlando Lee.

SanTese (doutorado) – Universidade Estadual de Campinas, Instituto de Computação.

San1. Otimização combinatória. 2. Algoritmos. I. Lee, Orlando,1969-. II. Universidade Estadual de Campinas. Instituto de Computação. III. Título.

Informações para Biblioteca Digital

Título em outro idioma: Problemas online de localização de instalações e de Steiner Palavras-chave em inglês:

Combinatorial optimization Algorithms

Área de concentração: Ciência da Computação Titulação: Doutor em Ciência da Computação Banca examinadora:

Orlando Lee [Orientador] Eduardo Candido Xavier

Rafael Crivellari Saliba Schouery Daniel Morgato Martin

Cristina Gomes Fernandes Data de defesa: 13-04-2015

Programa de Pós-Graduação: Ciência da Computação

Powered by TCPDF (www.tcpdf.org)

(5)

(6)

(7)

Institute of Computing /Instituto de Computa¸c˜ao University of Campinas /Universidade Estadual de Campinas

Online Facility Location and Steiner Problems

M´ario C´esar San Felice

1

April 13, 2015

Examiner Board/Banca Examinadora:

• Prof. Dr. Orlando Lee (Supervisor/Orientador) • Prof. Dr. Eduardo Candido Xavier

Institute of Computing - UNICAMP • Dr. Rafael Crivellari Saliba Schouery

Institute of Computing - UNICAMP • Prof. Dr. Daniel Morgato Martin

Centro de Matemática, Computa¸cão e Cogni¸cão - UFABC • Profa. Dra. Cristina Gomes Fernandes

Institute of Mathematics and Statistics - USP • Prof. Dr. Fl´avio Keidi Miyazawa

Institute of Computing - UNICAMP (Substitute/Suplente) • Prof. Dr. F´abio Luiz Usberti

Institute of Computing - UNICAMP (Substitute/Suplente) • Prof. Dr. Jos´e Coelho de Pina

Institute of Mathematics ans Statistics - USP (Substitute/Suplente)

1_{Financial support: FAPESP scholarship (process 2009/15535-1) 2010–2014 and FAPESP scholarship}

(process 2012/06728-3) 2012–2013

(8)

(9)

Abstract

In this thesis we study online problems from the facility location and Steiner families, through the point of view of competitive analysis. The goal in these problems is to build a minimum cost network to attend a certain demand. We present known results for the Online Facility Location problem (OFL), the Online Steiner Tree problem (OST) and the Online Single-Source Rent-or-Buy problem (OSRoB). The OFL consists of serving a set of clients by opening some facilities and by connecting each client to a facility. The OST aims to connect a set of terminals in order to create a tree network, that may contain nonterminals, called Steiner nodes. The OSRoB is a rent-or-buy version of the OST, in which all terminals must be connected to a special node called root. The algorithms and techniques that we present for these problems play an important role in the design of our algorithms for the problems we consider.

We present new results for the Online Prize-Collecting Facility Location problem (OPFL), the Online Steiner Tree Star problem (OSTS), and the Online Connected Facil-ity Location problem (OCFL). The OPFL is a generalization of the OFL, in which some clients may be left unconnected by paying a penalty. The OSTS is a variant of the OST, in which the nodes have non-negative costs. The OCFL is a combination of the OFL and the OST, in which a set of clients needs to be served by opening some facilities, by connecting each client to a facility, and by creating a more expensive tree network that connects the open facilities.

(10)

(11)

Resumo

Nesta tese estudamos problemas online das fam´ılias de localiza¸cão de instala¸cões e de Steiner, através da abordagem de análise competitiva. O objetivo nestes problemas é construir uma rede de custo m´ınimo para atender a uma determinada demanda. Nós apresentamos resultados conhecidos para o problema Online da Localiza¸cão de Instala¸cões (OFL), o problema Online da Árvore de Steiner (OST) e o problema Online Single-Source Rent-or-Buy (OSRoB). O OFL consiste em atender a um conjunto de clientes, através da abertura de algumas instala¸cões e da conexão de cada cliente com uma instala¸cão aberta. O OST tem por objetivo conectar um conjunto de terminais utilizando uma árvore, que pode conter vértices não terminais, chamados vértices de Steiner. O OSRoB é uma versão rent-or-buy do OST, onde todos os terminais devem ser conectados a um nó especial chamado ra´ız. Os algoritmos e técnicas que apresentamos para estes prob-lemas são importantes no desenvolvimento dos nossos algoritmos para os probprob-lemas que consideramos.

Apresentamos novos resultados para o problema Online da Localiza¸cão de Instala¸cões com Coleta de Prêmios (OPFL), o problema Online da Árvore Estrela de Steiner (OSTS), e o problema Online da Localiza¸cão de Instala¸cões Conectadas (OCFL). O OPFL é uma generaliza¸cão do OFL, em que alguns clientes podem ficar desconectados mediante o pagamento de penalidades. O OSTS é uma variante do OST, em que os vértices possuem custos não negativos. O OCFL é uma combina¸cão do OFL e do OST, em que um conjunto de clientes precisa ser atendido através da abertura de algumas instala¸cões, da conexão de cada cliente com uma instala¸cão aberta, e da constru¸cão de uma árvore, mais custosa, que conecta as instala¸cões abertas.

(12)

(13)

Acknowledgements

I would like to thank my family, specially my wife Mariana, who was by my side and supported me during all the difficulties and crises of this long process, and my parents Santina and Natal, who helped me to pursuit my dreams, even when they were not able to comprehend them.

I thank greatly my advisor Orlando Lee, with whom I have worked for a long time, for all the patience and effort he devoted to our research project, to my training as a researcher, and to help me overcome the hard times.

I also thank the supervisor of my research internship abroad David P. Williamson, who taught me a lot about research in the year we worked together, and who also helped me to feel welcome and to enjoy living in a foreign country.

I would like to thank my friends, particularly Pedro Hokama, Priscila Biller and Sin-Shuen Cheung, for being there for me when I needed help in situations related or not to my PhD.

I also thank the professors of the Institute of Computing, in particular Fl´avio Keidi Miyazawa, Islene Caciolari Garcia, Rodolfo Jardim Azevedo and Christiane Neme Cam-pos, for all the knowledge they helped me to obtain over the years, both in computer science and in life.

I thank the staff of the Institute of Computing for their support during the period of this work.

For the technical and financial support I thank the S˜ao Paulo Research Foundation (FAPESP).

(14)

(15)

Chapter 1 Introduction

In this thesis we study online problems through the point of view of competitive analysis. In particular, we are interested in infrastructure problems like the Online Facility Location problem (OFL) and the Online Steiner Tree problem (OST). We present known results for the OFL, the OST and the Online Single-Source Rent-or-Buy problem (OSRoB). These results (algorithms and techniques) play an important role in the design and analysis of our algorithms. We have proposed the Online Prize-Collecting Facility Location problem (OPFL), the Online Steiner Tree Star problem (OSTS) and the Online Connected Facility Location problem (OCFL). Our main contributions are competitive algorithms for these problems. Also, we give a simpler competitive analysis for an OSRoB algorithm.

In this chapter we briefly describe the online computation area, the competitive anal-ysis approach, and the relevant combinatorial problems. Also, we present some notation and definitions that are used in the forthcoming chapters. In Chapter 2 we study competi-tive algorithms for the OFL and the OPFL. In Chapter 3 we study competicompeti-tive algorithms for the OST, the OSRoB and the OSTS. In Chapter 4 we study competitive algorithms for the OCFL. Finally, in Chapter 5 we summarize the results we have obtained and discuss future work.

1.1 Online Computation

In online computation we deal with combinatorial optimization problems. A combinatorial optimization problem consists of a set of inputs I, a set of solutions O and an objective

function C. For each input I ∈ I there is a set of feasible solutions F(I) ⊆ O, and for

each solution O ∈ F(I) there is a value C(I, O), that represents the cost of solution O with respect to the input I.

In a typical problem, we can usually break its input in several parts. In the offline computing model an algorithm for this problem has access to all parts of the input before

(20)

1.2. Competitive Analysis 2 starting to build a solution. These are called offline algorithms. In the online computing model an algorithm has access only to one part of the input at a time, and it has to serve each part that arrives, by building a feasible solution for the parts of the input that it already knows, before receiving the next part. The decisions made by the algorithm while building the solution cannot be changed in the future. Algorithms that respect these restrictions are called online algorithms. Note that several problems fit both in the offline and in the online model.

1.2 Competitive Analysis

We often analyze online algorithms via competitive analysis [2]. The competitive analysis resembles the worst case guarantee analysis of approximation algorithms, and is used to ensure a quality guarantee for an online algorithm. A ratio, called competitive ratio, is used to quantify the quality of the solutions of the algorithm being analyzed. This ratio arises from the comparison between the cost of a solution for an online algorithm ALG and the cost of a solution for an optimal offline algorithm OPT. For any input I, let ALG(I) and OPT(I) be, respectively, the cost paid by ALG and OPT to serve I. We say that a deterministic ALG for a minimization problem is c-competitive if, for every input

I, the following inequality holds:

ALG(I) ≤ c · OPT(I) + κ , (1.1)

where κ is a constant that does not depend on I. Note that we can define the competitive ratio for a maximization problem in a similar way. Also, we define the expected competitive

ratio for a randomized algorithm by applying expectation to inequality (1.1), i.e.:

E[ALG(I)] ≤ c · OPT(I) + κ . (1.2)

All the results presented in this work achieve strict competitive ratios, i.e., κ = 0.

One way to think about competitive analysis is as a game between an online algorithm ALG and a malicious adversary ADV. The adversary can be seen as an algorithm that receives ALG as an input, and builds an input I∗ for ALG that maximizes the ratio

ALG(I∗₎

OPT(I∗₎. For sufficiently large inputs, the maximum value of this ratio is a lower bound on

the competitive ratio of ALG. Thus, we may think that the goal of ALG is to minimize the cost to serve I∗, while the goal of ADV is to build the worst input to ALG, i.e., one

(21)

1.3. Facility Location problems 3

1.3 Facility Location problems

We are interested in metric versions of problems from the Facility Location family. Roughly speaking, a typical problem in this family consists of serving a set of clients by opening some facilities and connecting each client to a facility, where the cost to connect a client to a facility is given by a metric function.

1.3.1 Online Facility Location problem

In the Uncapacitated Facility Location problem (FL), we have a set of clients and a set of possible facilities in a metric space. Each facility has a cost associated with opening it. The cost of assigning a client to a facility is the distance between the two points. The goal of the problem consists of selecting a set of facilities that will be opened, and assigning each client to an open facility, so that the total cost of opening the facilities plus the cost of connecting the clients is minimized. The FL is an NP-hard problem that has been well-studied; several constant ratio approximation algorithms are known for it [26, 22, 4, 21]. It is remarkable that a great variety of combinatorial techniques, such as LP rounding, primal-dual method and local search, were successful at achieving good approximation ratios for this problem.

The online version of the FL is the Online Facility Location problem (OFL), in which the clients are revealed one at a time and each one needs to be connected to an open facility before the next one arrives. As time progresses, no connection can be changed or opened facility can be closed. There are randomized and deterministic O(log n)-competitive al-gorithms known for the OFL [23, 11, 10, 24, 12], where n is the number of clients. Also, the best lower bound for the competitive ratio of an algorithm for the OFL is Ω log n

log log n

, due to Fotakis [11].

1.3.2 Online Prize-Collecting Facility Location problem

The Prize-Collecting Facility Location problem (PFL) is a generalization of the FL in which some clients may be left unconnected by paying a penalty. Another way to think about this problem is that every client has a prize that can only be collected if it is connected to some facility. There is a constant ratio approximation algorithm known for the PFL, due to Xu and Xu [31], that uses a combination of a primal-dual algorithm with local search techniques.

The Online Prize-Collecting Facility Location problem (OPFL) is the online version of the PFL. In [6], San Felice, Cheung, Lee and Williamson proposed this problem and gave a primal-dual O(log n)-competitive algorithm for it, that is inspired on a previous algorithm for the OFL, due to Fotakis [10] and Nagarajan and Williamson [24]. Notice

(22)

1.4. Steiner problems 4 that, since the OPFL is a generalization of the OFL, the lower bound of Ω log n

log log n

applies to it.

1.4 Steiner problems

We are interested in graph problems from the Steiner family. Roughly speaking, a typical problem in this family consists of connecting a set of terminals in order to create a connected network. This network may contain nonterminals, called Steiner nodes.

1.4.1 Online Steiner Tree problem

The Steiner Tree problem (ST) is a network design problem defined in a graph with edge costs. Its input consists of a graph G and a subset of nodes of G, which we call terminals. A solution for the ST is a tree in G that contains all terminals and that may contain Steiner nodes. The goal is to minimize the total cost of the edges in the tree. The ST is a well-studied NP-hard problem for which are known several constant ratio approximation algorithms [29, 30]. Many important techniques were used in these algorithms such as greedy strategy, primal-dual method, and randomized rounding.

The online version of the ST is the Online Steiner Tree problem (OST), in which the terminals are revealed one at a time and each one needs to be connected to the current tree before the next one arrives. Also, no edge in the tree can be removed in the future. There are O(log n)-competitive algorithms known for the OST [18, 3], where n is the number of terminals. These algorithms are asymptotically optimal, in the sense that there is a lower bound of Ω(log n) for the competitive ratio of any online algorithm for the OST, due to Imase and Waxman [18].

1.4.2 Online Single-Source Rent-or-Buy problem

The Single-Source Rent-or-Buy problem (SRoB) is a rent-or-buy version of the ST, in which all terminals must be connected to a special node called root. To connect a terminal one may rent or buy edges. If an edge is rented, only one terminal can use it. If an edge is bought, any terminal may use it. However, the cost of buying an edge is M times greater than the cost of renting an edge, where M is a parameter of the input. There are constant ratio approximation algorithms known for the SRoB [14, 15, 13].

The Online Single Source Rent-or-Buy problem (OSRoB) is the online version of the SRoB, and also a rent-or-buy version of the OST. In the OSRoB the terminals arrive one at a time, and each one must be connected to the root before the next one arrives. There are O(log n)-competitive algorithms known for the OSRoB, due to Awerbuch et al. [1].

(23)

1.5. Online Connected Facility Location problem 5 Since the OST can be reduced to the OSRoB, there is a lower bound of Ω(log n) for the competitive ratio of any algorithm for the OSRoB.

1.4.3 Online Steiner Tree Star problem

The Steiner Tree Star problem (STS) is a variant of the ST. The input to the STS is the same as the ST, except that there is a non-negative cost associated with each node. A solution for the STS is a tree that contains all terminals and that may contain Steiner nodes. The goal is to minimize the total cost of the edges in the tree, plus the cost of the internal nodes of the tree, where a node is internal if its degree is greater than 1. The STS was proposed, and constant ratio approximation algorithms were given to it, by Khuller and Zhu [20].

The OSTS is the online version of the STS, in which the terminals arrive one at a time. Also, no edge or node of the tree can be removed in the future. We proposed this problem and gave a primal-dual O(log2_n_{)-competitive algorithm for it. Notice that,}

since the OSTS is a generalization of the OST, the lower bound of Ω(log n) applies to the competitive ratio of algorithms for the OSTS.

1.5 Online Connected Facility Location problem

The Connected Facility Location problem (CFL) is a network design problem with two layers; it is motivated by the necessity of building networks in which the end users are connected to servers, with less expensive lower bandwidth connections, and the servers are connected to each other, through more expensive higher bandwidth connections. The input to the CFL is the same as the FL, except that there is a facility that is designated to be the root, that represents the connection of the network to the outside world, and a parameter M ≥ 1, which is a cost scaling factor.

A solution for the CFL is a set of open facilities (including the root), an assignment of clients to open facilities, and a tree containing the open facilities. The goal of the problem is to minimize the total cost of opening facilities, plus the total cost of connecting clients to their assigned facilities, plus M times the cost of the edges in the tree containing the open facilities. The CFL is an NP-hard problem; it has randomized and deterministic constant ratio approximation algorithms [13, 16, 27, 17, 19, 5] that use techniques such as sample-and-augment, LP rounding and primal-dual method. The CFL can be seen as a combination of the FL with the ST, using the cost scaling factor M.

The online version of the CFL is the Online Connected Facility Location problem (OCFL), in which the clients are revealed one at a time and each one needs to be con-nected with a facility before the next one arrives. If a new facility is opened, it needs

(24)

1.6. Notation and Definitions 6 to be connected to the tree containing the other opened facilities immediately. Also, no connection can be changed, no opened facility can be closed, and no edge used in the tree can be removed in the future. We can also view the OCFL as the combination of the OFL and the OST, using a cost scaling factor M. Since the OST can be reduced to the OCFL, there is a lower bound of Ω(log n) for the competitive ratio of any algorithm for the OCFL.

In [7], San Felice, Williamson and Lee proposed the OCFL and presented a random-ized O(log2_n_{)-competitive algorithm for it, where n is the number of clients. That}

algo-rithm combines the sample-and-augment technique of Gupta, Kumar, P´al, and Rough-garden [13] with ideas of previous algorithms for the OFL [10, 24] and the OST [18]. Also in [7], they showed that the same algorithm is a deterministic O(log n)-competitive algorithm for the special case of the OCFL in which M = 1.

In [9] we used a more sophisticated analysis to show that the algorithm presented in [7] is O(log n)-competitive for the OCFL. Umboh [28] has independently obtained a deterministic O(log n)-competitive algorithm for this problem.

1.6 Notation and Definitions

We suppose that the reader is familiarized with basic concepts and terminology from analysis of algorithms and graph theory. In this section we present notation and definitions that are used throughout the following chapters. More specific notation is presented later when they are needed. Due to the use of several symbols in this text, we found convenient to add a table of symbols at the end of this document. Should the reader forget the meaning of a symbol or a notation, he/she should find it in this table.

General Notation. For any real number v, we define (v)+ _{= max{0, v} and, for any}

positive integer n, we denote the n-th harmonic number by Hn = 1 +1₂ +1₃ + · · · + _n1.

In general, the problems we study receive as input a graph G = (V, E), with non-negative edge costs. We use d to denote both the edge cost function, as well as the distance function based on shortest paths. Thus, for any e in E, we denote the cost of edge e by d(e) and, for any j and i in V , we denote the distance (i.e., the length of a shortest path) from j to i by d(j, i). Also, we denote the subgraph of a shortest path from

j to i by path(j, i). In general, we denote a subgraph J by a tuple (V (J), E(J)).

For S ⊆ V , we denote a closest node to j in S by closest(j, S) (if there are more than one, we pick any one of them arbitrarily), the distance from j to this node by d(j, S), and a shortest path from j to this node by path(j, S).

When dealing with subsets (or subgraphs), we denote the configuration of a subset (or a subgraph) A after the first k parts of the input arrived by Ak.

(25)

1.6. Notation and Definitions 7

Facility Location Notation. When dealing with facility location problems, in general,

we denote a client by j and a facility by i. Note that j also represents the location of the client and i also represents the location of the facility. Also, we denote the set of clients by D, the set of points in which a facility may be opened by F , and the set of open (or active) facilities by Fa.

When the arrival order matters we denote the position of client j in the input sequence by n(j) and, if it is useful for the analysis, we use indexed clients, denoting the k-th client to arrive by jk. Also, we denote the set of the first k clients that arrived by Dk, and the

set of open facilities that serve these clients by Fa k.

We use the convention that the cost for opening a facility is given by a function f, i.e., we denote the cost for opening a facility i by f(i). Also, we denote the facility to which client j is connected in the solution by a(j), and the facility to which client j is connected in an offline optimal solution by a∗(j).

Steiner Notation. When dealing with Steiner problems, in general, we denote a

ter-minal by j and a node by i. Also, we denote the set of terter-minals by D, and the tree that connects the terminals by a subgraph T = (V (T ), E(T )).

When the arrival order matters we denote the position of terminal j in the input sequence by n(j) and, if it is useful for the analysis, we use indexed terminals, denoting the k-th terminal to arrive by jk. Also, we denote the set of the first k terminals to arrive

(26)

Chapter 2 Facility Location problems

In this chapter we study two problems from the Facility Location family. In Section 2.1 we focus on the OFL: we present a well-known primal-dual algorithm for this problem, and its competitive analysis. In Section 2.2 we focus on the OPFL, presenting a new primal-dual algorithm for this problem, and its competitive analysis. An article with these results was accepted at the VIII Latin-American Graph and Optimization Symposium (LAGOS 2015) [6].

2.1 Online Facility Location problem

The Online Facility Location problem (OFL) is the online version of the FL. We present and analyze a primal-dual O(log n)-competitive algorithm for the OFL by Nagarajan and Williamson [24], which is equivalent to a previous algorithm due to Fotakis [10]. We rely on this algorithm in other places of the text.

The input for the OFL is a complete graph G = (V, E), a distance function d : E → R+

that respects the triangle inequality, a set of nodes in which a facility may be opened F ⊆

V, a facility opening cost function f : F → R+ and a set of clients D ⊆ V . Henceforth,

we shall denote the non-online part of input for the OFL by a tuple (G, d, F, f), leaving implicit that G = (V, E).

We assume that initially all facilities are closed. The clients in D arrive one at a time, and each one that arrives must be served before the next one does. To serve a client the algorithm may connect it to a previously opened facility, or open a new facility and connect the client to it. All decisions of the algorithm are irrevocable. This means that the algorithm cannot remove from the current solution any facility previously opened, or change to which facility a client is connected, even if a closer facility was opened later.

We want to minimize the total cost, which is the sum of the cost of the open facilities

Fa, plus the distance of each client j to its assigned open facility a(j) in Fa. More

(27)

2.1. Online Facility Location problem 9 precisely, we want Fa and a that minimize:

X

i∈Fa

f(i) + X j∈D

d(j, a(j)) . (2.1)

2.1.1 Linear Programming Relaxation and Its Dual

We present a well-known linear programming relaxation of the OFL min Pi∈Ff(i)yi+Pj∈DPi∈Fd(j, i)xji

s.t. xji ≤ yi for j ∈ D and i ∈ F,

P

i∈Fxji ≥ 1 for j ∈ D,

yi ≥ 0, xji ≥ 0 for j ∈ D and i ∈ F ,

and its dual

max Pj∈Dαj

s.t. Pj∈Dβji ≤ f(i) for i ∈ F,

αj− βji ≤ d(j, i) for j ∈ D and i ∈ F , αj ≥ 0, βji ≥ 0 for j ∈ D and i ∈ F.

Since the second restriction of the dual ensures that αj−d(j, i) ≤ βji, we can eliminate

the variables βji in the first restriction, obtaining the following compact and equivalent

dual:

max Pj∈Dαj

s.t. Pj∈D(αj − d(j, i))+≤ f(i) for i ∈ F ,

αj ≥ 0 for j ∈ D.

The dual variable αj is explicitly used in some facility location algorithms, and may be

interpreted as the amount that client j is willing to pay in order to connect to a facility, or in order to open a facility that is closer to it than the current open facilities.

2.1.2 OFL Algorithm

In this subsection we describe a primal-dual algorithm for the OFL, due to Nagarajan and Williamson [24]. It is equivalent to an algorithm proposed by Fotakis [10], but we decided to present the former algorithm because it is simpler to understand and analyze. Let us give a high level description of the algorithm, whose pseudo-code is presented in Algorithm 1. Let (G, d, F, f) be the non-online part of the input for the OFL Algorithm, and recall that it receives one client at a time and must serve it.

(28)

2.1. Online Facility Location problem 10

Input: (G, d, f, F )

1 Fa_{← ∅; D ← ∅;}

2 while a new client j0 arrives do

3 increase αj0 until one of the following happens:

4 (a) αj0 = d(j0, i) for some i ∈ Fa;

5 (b) f(i) = (α_j0− d(j0, i)) +P_j_∈D(d(j, Fa) − d(j, i))+ for some i ∈ F \ Fa;

6 Fa _{← F}a_{∪ {i}; D ← D ∪ {j}0_{}; a(j}0) ← i;

7 end

8 return (Fa, a);

Algorithm 1: OFL Algorithm.

Every time a new client j0 arrives the algorithm increases its dual variable α

j0 until,

(a) it is connected to a facility that is already open, or (b) a new facility is opened and it is connected to it. Note that in this case the cost of the new facility i is paid by the

α variable of the current client j0, and by contributions from the other clients, that are

closer to i than to previously opened facilities.

The solution built by the algorithm consists of a set of open (or active) facilities Fa,

and a function a that assigns each client to an open facility.

2.1.3 Competitive Analysis of the Algorithm

In this subsection we analyze the competitivity of the OFL Algorithm.

During the analysis we denote the set of clients by Dn, and the total number of clients

by n = |Dn|. Consider the solution (Fna, a) computed by the OFL Algorithm to serve Dn.

So ALGOFL(Dn) = X i∈Fa n f(i) + X j∈Dn d(j, a(j)) , (2.2)

is the cost paid by the OFL Algorithm to serve Dn.

Similarly, consider the FL offline optimal solution (F∗

n, a∗) with which we are

compar-ing. So OPTFL(Dn) = X i∈F_n∗ f(i) + X j∈Dn d(j, a∗(j)) , (2.3)

is the cost of the optimal solution.

The next theorem is the main result of this subsection.

Theorem 2.1.1.

ALGOFL(Dn) ≤ 4 log n OPTFL(Dn) .

(29)

Lemma 2.1.1. For any i in F and any iteration k, we have that:

f(i) ≥ X j∈Dk

(d(j, Fa

k) − d(j, i))+ . (2.4)

Proof. Suppose, as our induction hypothesis, that f(i) ≥ Pj∈Dk−1(d(j, F

a

k−1) − d(j, i))+

holds for any i ∈ F. Considering just the i in Fa

k, we have that: f(i) ≥ X j∈Dk (d(j, Fa k) − d(j, i))+ , (2.5) because d(j, Fa k) ≤ d(j, i) for any i in Fka.

Now we consider the i in F \ Fa

k. Case (b) of the algorithm implies that f(i) ≥ αk− d(jk, i) +Pj∈Dk−1(d(j, F

a

k−1) − d(j, i))+. Thus, due to case (b) and the induction

hypothesis, for any i in F \ Fa

k we have f(i) ≥ (αk− d(jk, i))++ X j∈Dk−1 (d(j, Fa k−1) − d(j, i))+ . (2.6)

If client jk was served by case (a) we have αk = d(jk, Fka−1). So

f(i) ≥ (αk− d(jk, i))++ X j∈Dk−1 (d(j, Fa k−1) − d(j, i))+ = (d(jk, F_ka₋₁) − d(jk, i))++ X j∈Dk−1 (d(j, Fa k−1) − d(j, i))+ = X j∈Dk (d(j, Fa k) − d(j, i))+ , (2.7)

where the last equality follows because if jk was served by case (a) then Fka−1 = Fka.

If client jk was served by case (b) and facility i0 was opened we have f(i0) = αk − d(jk, i0) + P

j∈Dk−1(d(j, F

a

k−1) − d(j, i0))+. Since, by the induction hypothesis, f(i0) ≥

P j∈Dk−1(d(j, F a k−1) − d(j, i0))+, we have that αk≥ d(jk, i0). So f(i) ≥ (αk− d(jk, i))++ X j∈Dk−1 (d(j, Fa k−1) − d(j, i))+ ≥ (d(jk, i0) − d(jk, i))++ X j∈Dk−1 (d(j, Fa k−1) − d(j, i))+ ≥ (d(jk, F_ka) − d(jk, i))++ X j∈Dk−1 (d(j, Fa k) − d(j, i))+ = X j∈Dk (d(j, Fa k) − d(j, i))+ , (2.8)

where the last inequality follows because i0 is the closest facility to j

kand Fka = Fka−1∪{i0}.

(30)

Lemma 2.1.2. Suppose that at iteration k a new facility ik is opened. Then, for any client j in Dk−1, we have that:

(αj − d(j, Fka))+ = (αj − d(j, Fka−1))++ (d(j, Fka−1) − d(j, ik))+ .

Proof. Note that any client j in Dk₋₁ was connected, by case (a) or (b) of the algo-rithm, to some facility i in Fa

k−1. If j was connected by case (a) we have that αj = d(j, i) ≥ d(j, Fa

k−1). Otherwise, due to case (b) we have that f(i) = αj − d(j, i) +

P

j∈Dn(j)−1(d(j, F

a

n(j)−1) − d(j, i))+. Using lemma (2.1.1) we have that αj ≥ d(j, i) ≥ d(j, Fa

k−1). Thus, αj ≥ d(j, Fka−1) and it suffices to show that:

(αj − d(j, Fka)) = (αj − d(j, Fka−1)) + (d(j, Fka−1) − d(j, ik))+ . (2.9)

If ik is not the closest facility to j in Fka (i.e. d(j, ik) > d(j, Fka−1)), then the previous

equation holds, because d(j, Fa

k−1) = d(j, Fka) and (d(j, Fka−1) − d(j, ik))+= 0.

Otherwise (i.e. d(j, ik) ≤ d(j, Fka−1)), we have that (d(j, Fka−1)−d(j, ik))+ = (d(j, Fka−1)−

d(j, ik)) and d(j, ik) = d(j, Fka). Thus, (2.9) holds, and this concludes the proof.

Now we bound the cost of the algorithm using the dual variables.

Lemma 2.1.3.

ALGOFL(Dn) ≤ 2 ·

X

j∈Dn

αj . Proof. We show that the inequalities

X j∈Dn d(j, a(j)) ≤ X j∈Dn αj , (2.10) and X i∈Fa n f(i) ≤ X j∈Dn αj , (2.11)

hold, which implies the desired result.

First we show that (2.10) holds. Consider a generic iteration, let us say k, in which client jk arrives. By the way the algorithm works, when a(jk) is set for the first time, it

never changes again. Moreover, the value of αk never decreases, being defined in case (a)

or (b) of the algorithm. If case (a) happens then d(jk, a(jk)) = αk. If case (b) happens

then d(jk, a(jk)) ≤ αk. Thus, we have thatPj∈Dnd(j, a(j)) ≤ P

j∈Dnαj.

Now let us show that (2.11) holds. To do so, we show that after the algorithm serves the k-th client, the following equation holds

X i∈Fa k f(i) = X j∈Dk (αj − d(j, Fka))+= k X l=1 (αl− d(jl, Fka))+ . (2.12)

(31)

2.1. Online Facility Location problem 13 Recall that αj can be viewed as the value that client j offers for opening facilities close

to it, and d(j, Fa

k) as the part of αj still available for paying for new facilities. So,

(αj − d(j, Fka))+ is the part of αj already used in the opening of one or more facilities.

Under this interpretation, we have that a client j contributes for the opening of a facility i only if i is closer to j than any of the already open facilities. So, a client j will contribute for opening a facility i at iteration k, only if i is inside the ball of center j and radius

d(j, Fa k).

The proof of (2.12) is by induction on k.

Base: In the beginning, equation (2.12) holds because Fa

0 = D0 = ∅.

Induction Hypothesis: Assume that after the first k − 1 clients were served,

equa-tion (2.12) with k replaced by k − 1 holds, i.e.

X i∈Fa k−1 f(i) = X j∈Dk−1 (αj − d(j, Fka−1))+= k_X−1 l=1 (αl− d(jl, Fka−1))+ .

Step: We show that equation (2.12) holds after the k-th client is served by the

algorithm.

Since a client is served according to case (a) or (b) of the algorithm, we analyze each one of them. Suppose that the algorithm connects jk to the facility ik when serving it.

Notice that ik may already be open. First, assume that case (a) happened. In this case

we have that ik∈ Fka−1 and, hence, Fka = Fka−1. So:

X i∈Fa k f(i) = X i∈Fa k−1 f(i) = X j∈Dk−1 (αj − d(j, Fka−1))+ = X j∈Dk (αj − d(j, Fka))+ , (2.13)

where the first equality holds because Fa

k = Fka−1, the second equality follows from the

induction hypothesis, and the last equality holds because Fa

k = Fka−1 and αk = d(jk, Fka).

Now suppose that case (b) happened. In this case a new facility ik is opened and we

have: f(ik) = (αk− d(jk, ik)) + X j∈Dk−1 d(j, F_ka₋₁) − d(j, ik) + . (2.14)

(32)

2.1. Online Facility Location problem 14 To see that (2.12) holds, note that:

X i∈Fa k f(i) = X i∈Fa k−1 f(i) + f(ik) = X j∈Dk−1 (αj − d(j, Fka−1))++ f(ik) = X j∈Dk−1 (αj − d(j, Fka−1))++ (αk− d(jk, ik)) + X j∈Dk−1 (d(j, Fa k−1) − d(j, ik))+ = X j∈Dk−1 (αj − d(j, Fka−1))++ (d(j, Fka−1) − d(j, ik))+ + (αk− d(jk, ik)) = X j∈Dk−1 (αj − d(j, Fka))++ (αk− d(jk, ik))+ = X j∈Dk−1 (αj − d(j, Fka))++ (αk− d(jk, Fka))+ = X j∈Dk (αj − d(j, Fka))+ , (2.15)

where the second equality follows from the induction hypothesis, the third equality fol-lows from the condition in case (b) of the algorithm, and the fifth equality folfol-lows from Lemma 2.1.2.

This concludes the proof of (2.12). Therefore, we have thatPi∈Fa

nf(i) = P j∈Dn(αj− d(j, Fa n))+ ≤ P

j∈Dnαj for any set of clients Dn. So, ALGOFL(Dn) ≤ 2 · P

j∈Dnαj.

The next lemma shows that αk is bounded by the length of any path that goes from jk to any facility in Fka−1.

Lemma 2.1.4. Let je and jk be two clients such that je arrived earlier than jk, i.e., e < k. Let Fa

k−1 be the set of facilities opened just before jk arrived. Then, for every node x ∈ V

we have that:

αk ≤ d(je, F_ka₋₁) + d(je, x) + d(jk, x) .

Proof. Remember that closest(j, Fa) denotes the closest facility to j in Fa. We have: αk ≤ d(jk, Fka−1)

≤ d(jk,closest(je, F_ka₋₁))

≤ d(jk, je) + d(je,closest(je, F_ka₋₁))

(33)

2.1. Online Facility Location problem 15 where the first inequality follows from the condition in case (a) of the algorithm, and the other inequalities follows from the triangle inequality.

The next lemma bounds the cost of the dual variables α.

Lemma 2.1.5. Let D ⊆ Dn be any subset of clients, and i ∈ F be any place in which a facility may be opened. Then:

f(i) 2 ≥ X j∈D αj 2H|Dn| − d(j, i) ! .

Proof. Before proving this lemma, let us define some notation. We denote the clients in D by {j[1], j[2], . . . , j[nD]}, where nD = |D|. The index of clients respect the direct order of their arrivals, namely j[1] is the earliest client in D to arrive, and j[nD] is the latest. Notice that jk is different from j[k], since the former is the k-th client to arrive, and the

latter is the k-th client among those in D. Note that the clients j[k] and j[k+1] may not be

consecutive, since other clients that are not in D may have arrived between their arrivals. Furthermore, for each j[k] in D let [D]k = {j[1], . . . , j[k]} be the set of the first k clients to

be added to D.

Consider the behavior of the algorithm when client j[k] arrives. Due to cases (a) and

(b) of the algorithm, for any i ∈ F we have:

f(i) ≥ α[k]− d(j[k], i) + X j∈D[k]−1 (d(j, Fa [k]−1) − d(j, i))+ ≥ α[k]− d(j[k], i) + X j∈[D]k−1 (d(j, Fa [k]−1) − d(j, i))+ ≥ α[k]− d(j[k], i) + X j∈[D]k−1 (α[k]− d(j[k], i) − 2d(j, i)) = k(α[k]− d(j[k], i)) − 2 X j∈[D]k−1 d(j, i) , (2.17)

where the second inequality follows because [D]k−1 ⊆ D[k]−1, and the last inequality

follows from Lemma 2.1.4.

This implies a lower bound for the cost of opening any facility. Dividing the previous inequality by k, and summing up the two sides for every client in D, we have:

nD X k=1 f(i) k ≥ nD X k=1 (α[k]− d(j[k], i)) − 2 nD X k=1 X j∈[D]k−1 d(j, i) k . (2.18)

(34)

2.1. Online Facility Location problem 16 Notice that: nD X k=1 X j∈[D]k−1 d(j, i) k = nD X k=1 k_X−1 l=1 d(j[l], i) k =nXD−1 l=1 nD X k=l+1 d(j[l], i) k =nXD−1 l=1 d(j[l], i) nD X k=1 1 k − l X k=1 1 k ! =XnD l=1 d(j[l], i) (HnD− Hl) . (2.19) Also, we have that:

nD X

k=1 f(i)

k = HnDf(i) . (2.20)

Using equations (2.19) and (2.20) together with inequality (2.18), we have that:

HnDf(i) ≥ nD X k=1 (α[k]− d(j[k], i)) − 2 nD X k=1 X j∈[D]k−1 d(j, i) k = XnD k=1 (α[k]− d(j[k], i)) − 2 nD X l=1 d(j[l], i)(HnD− Hl) = XnD k=1 α[k]− d(j[k], i) − 2d(j[k], i)(HnD − Hk) = XnD k=1 (α[k]− 2HnDd(j[k], i)) + nD X k=1 (2Hk− 1)d(j[k], i) ≥ nD X k=1 (α[k]− 2HnDd(j[k], i)) = X j∈D (αj − 2HnDd(j, i)) . (2.21)

Dividing the two sides of this inequality by 2H|Dn|, we have that:

f(i) 2 ≥ HnDf(i) 2H|Dn| ≥ X j∈D αj 2H|Dn| −HnDd(j, i) H_|Dn| ! ≥ X j∈D αj 2H|Dn| − d(j, i) ! , (2.22)

(35)

2.2. Online Prize-Collecting Facility Location problem 17 Now we are ready to prove Theorem 2.1.1. Let α be the dual variables of a solution produced by the algorithm. For a dual solution to be feasible, its variables must respect all the dual constraints. The compact version of OFL dual formulation has one set of constraints, that isPj∈D(αj− d(j, i))+≤ f(i), for any i in F . Lemma 2.1.5 shows that if

we divide α by 2Hn, the rescaled variables respect these dual constraints.

Proof. (Theorem 2.1.1) Using Lemmas 2.1.3 and 2.1.5, and the fact that, the cost of a

feasible dual solution is a lower bound to the cost of any primal solution, we have that: ALGOFL(Dn) ≤ 2 X j∈Dn αj = 4Hn X j∈Dn αj 2Hn ≤ 4HnOPTFL(Dn) ≤ 4 log n OPTFL(Dn) .

2.2 Online Prize-Collecting Facility Location

prob-lem

The Online Prize-Collecting Facility Location problem (OPFL) is a generalization of the OFL, in which some clients may be left unconnected by paying a penalty. Another way to think about this problem is that every client has a prize, that can only be collected if it is connected. We present and analyze a primal-dual O(log n)-competitive algorithm for the OPFL by San Felice, Cheung, Lee and Williamson [6], that is inspired on the OFL Algorithm.

The input for the OPFL is a complete graph G = (V, E), a distance function d : E →

R+ that respects the triangle inequality, a set of nodes in which a facility may be opened F _{⊆ V , a facility opening cost function f : F → R}+, a penalty cost function p : V → R+,

and a set of clients D ⊆ V . Henceforth, we shall denote the non-online part of input for the OPFL by a tuple (G, d, F, f, p), leaving implicit that G = (V, E).

We assume that initially all facilities are closed. The clients in D arrive one at a time, and each one that arrives must be served before the next one does. To serve a client the algorithm may pay a penalty, and leave it unconnected, connect it to a previously opened facility, or open a new facility and connect the client to it. All decisions of the algorithm are irrevocable. This means that the algorithm cannot remove from the current solution any facility previously opened, change to which facility a client is connected or connect a client that was left unconnected, even if a closer facility was opened later.

(36)

2.2. Online Prize-Collecting Facility Location problem 18 We want to minimize the total cost, which is the sum of the cost of the open facilities

Fa, plus the distance of each client j to its assigned open facility a(j) in Fa, or its penalty

cost p(j), in case it was not connected (a(j) = null). More precisely, we want Fa and a

that minimize: X i∈Fa f(i) + X j∈Dc d(j, a(j)) + X j∈Dp p(j) , (2.23)

where Dc is the set of clients that were connected, and Dp is the set of clients that were

not connected.

2.2.1 Linear Programming Relaxation and Its Dual

We present a well-known linear programming relaxation of the OPFL min Pi∈F f(i)yi+Pj∈DPi∈Fd(j, i)xji+Pj∈Dp(j)zj

s.t. xji ≤ yi for j ∈ D and i ∈ F,

P

i∈F xji+ zj ≥ 1 for j ∈ D,

yi ≥ 0, xji≥ 0, zj ≥ 0 for j ∈ D and i ∈ F ,

and its dual

max Pj∈Dαj

s.t. Pj∈Dβji ≤ f(i) for i ∈ F,

αj− βji ≤ d(j, i) for j ∈ D and i ∈ F ,

αj ≤ p(j) for j ∈ D,

αj _{≥ 0, β}ji ≥ 0 for j ∈ D and i ∈ F.

Since the second restriction of the dual ensures that αj−d(j, i) ≤ βji, we can eliminate

the variables βji in the first restriction, obtaining the following compact and equivalent

dual:

max Pj∈Dαj

s.t. Pj∈D(αj − d(j, i))+≤ f(i) for i ∈ F ,

αj ≤ p(j) for j ∈ D,

αj ≥ 0 for j ∈ D.

2.2.2 OPFL Algorithm

In this subsection we describe a primal-dual algorithm for the OPFL, that is inspired on the OFL Algorithm.

(37)

2.2. Online Prize-Collecting Facility Location problem 19

Input: (G, d, f, p, F )

1 D← ∅; Fa _{← ∅;}

2 while a new client j0 arrives do

3 increase αj0 until one of the following happens:

4 (a) αj0 = d(j0, i) for some i ∈ Fa;

5 (b) f(i) = (α_j0− d(j0, i)) +P_j_∈D(min{d(j, Fa), p(j)} − d(j, i))+ for some

i_{∈ F \ F}a;

6 (c) αj0 = p(j0) (in this case i is chosen to be null);

7 Fa _{← F}a_{∪ {i}; D ← D ∪ {j}0_{}; a(j}0) ← i;

8 end

9 return (Fa, a);

Algorithm 2: OPFL Algorithm.

Let us give a high level description of the algorithm, whose pseudo-code is presented in Algorithm 2. Let (G, d, F, f, p) be the non-online part of the input for the OPFL Algorithm, and recall that it receives one client at each time and must serve it.

Every time a new client j0 arrives the algorithm increases its dual variable α

j0 until,

(a) it is connected to a facility that is already open, (b) a new facility is opened and it is connected to it, or (c) the client is left unconnected by paying its penalty p(j0). Note that

in case (b) the cost of the new facility i is paid by the α variable of the current client j0,

and by contributions from the other clients, that are closer to i than to previously opened facilities.

The solution built by the algorithm consists of a set of open (or active) facilities Fa,

and a function a that assigns each client to an opened facility. If a client j is not connected, then a(j) is set to null.

2.2.3 Competitive Analysis of the Algorithm

In this subsection we analyze the competitivity of the OPFL Algorithm.

During the analysis we denote the set of clients by Dn, and the total number of clients

by n = |Dn|. Let Dcnbe the set of clients that the algorithm connected to some facility, and Dp

nbe the set of clients for which the algorithm paid the penalty. Notice that Dcn∩Dnp = ∅.

Consider the solution (Fa

n, a) computed by the OPFL Algorithm to serve Dn. So

ALGOPFL(Dn) = X i∈Fa n f(i) + X j∈Dc n d(j, a(j)) + X j∈Dnp p(j) (2.24)

is the cost paid by the OPFL Algorithm to serve Dn.

Similarly, consider the PFL offline optimal solution (F∗

(38)

com-2.2. Online Prize-Collecting Facility Location problem 20 paring. So OPTPFL(Dn) = X i∈F_n∗ f(i) + X j∈Dn: a∗(j)6=null d(j, a∗(j)) + X j∈Dn: a∗(j)=null p(j) , (2.25)

is the cost of the optimal solution.

The next theorem is the main result of this subsection.

Theorem 2.2.1.

ALGOPFL(Dn) ≤ 6 log n OPTPFL(Dn) .

In order to prove this, we need several auxiliary results. First we prove the following.

Lemma 2.2.1. For any i in F and any iteration k, we have that:

f(i) ≥ X j∈Dk

(min{d(j, Fa

k), p(j)} − d(j, i))+ . (2.26)

We omit this proof because it is very similar to the proof of lemma 2.1.1.

Lemma 2.2.2. Suppose that at iteration k a new facility ik is opened. Then, for any client j in Dk−1 we have that:

(αj − d(j, Fka))+ = (αj − d(j, Fka−1))++ (min{d(j, Fka−1), p(j)} − d(j, ik))+ . Proof. We show this analyzing two cases. In the first case suppose that d(j, Fa

k−1) ≤

p(j). Considering that j was connected by case (a) or (b) of the algorithm, and using

lemma 2.2.1 we may conclude that αj ≥ d(j, Fka−1) = min{d(j, Fka−1), p(j)}. Thus, it

suffices to show that:

(αj − d(j, Fka)) = (αj − d(j, Fka−1)) + (d(j, Fka−1) − d(j, ik))+ . (2.27)

If ik is not the closest facility to j in Fka (i.e. d(j, ik) > d(j, Fka−1)), then the previous

equation holds, because d(j, Fa

k−1) = d(j, Fka) and (d(j, Fka−1) − d(j, ik))+ = 0. Otherwise

(i.e. d(j, ik) ≤ d(j, Fka−1)), we have that (d(j, Fka−1) − d(j, ik))+ = (d(j, Fka−1) − d(j, ik))

and d(j, ik) = d(j, Fka). Thus, (2.27) holds, and this concludes the first case.

In the second case suppose that d(j, Fa

k−1) > p(j). By case (c) of the algorithm, we have

that j is not connected, αj = p(j) = min{d(j, Fka−1), p(j)}, and (αj − d(j, Fka−1))+ = 0.

Thus, it suffices to show that:

(αj − d(j, Fka))+= (p(j) − d(j, ik))+ . (2.28)

If d(j, ik) ≥ p(j), then the previous equation holds, because we have (αj − d(j, Fka))+ =

(39)

2.2. Online Prize-Collecting Facility Location problem 21 Otherwise, we have d(j, ik) < p(j). Since in the second case p(j) < d(j, Fka−1), we

have that ik is the closest facility to j in Fka. So, d(j, Fka) = d(j, ik) and we have (αj − d(j, Fa

k))+ = (αj− d(j, ik)) = (p(j) − d(j, ik)). Thus, (2.28) holds, and this concludes the

second case. Since both cases hold, the result follows.

Now we bound the cost of the algorithm using the dual variables.

Lemma 2.2.3.

ALGOPFL(Dn) ≤ 2 ·

X

j∈Dn

αj . Proof. We show that the inequalities

X j∈Dc n d(j, a(j)) + X j∈Dpn p(j) ≤ X j∈Dn αj , (2.29) and X i∈Fa n f(i) ≤ X j∈Dn αj , (2.30)

hold, which implies the desired result.

First we show that (2.29) holds. Consider a generic iteration, let us say k, in which client jk arrives. By the way the algorithm works, when a(jk) is set for the first time,

it never changes again. Moreover, the value of αk never decreases, being defined in

case (a), (b) or (c) of the algorithm. If case (a) happens then d(jk, a(jk)) = αk and jk ∈ Dkc. If case (b) happens then d(jk, a(jk)) ≤ αk and jk ∈ Dkc. If case (c) happens

then p(jk) ≤ αk and j ∈ Dpk. Since Dkc ⊆ Dcn, D p k ⊆ Dpn, and Dck∩ D p k = ∅ we have that P j∈Dc nd(j, a(j)) + P j∈Dpnp(j) ≤ P j∈Dnαj.

Now let us show that (2.30) holds. To do so, we show that after the algorithm serves the k-th client, the following equation holds

X i∈Fa k f(i) = X j∈Dk (αj − d(j, Fka))+= k X l=1 (αl− d(jl, Fka))+ . (2.31)

We may think of αj as the value that client j offers for opening facilities close to it,

and d(j, Fa

k) as the part of αj still available for paying for new facilities. So, (αj − d(j, Fa

k))+ is the part of αj already used in the opening of one or more facilities. Under

this interpretation, we have that a client j contributes for the opening of a facility i only if i is closer to j than any of the already opened facilities and whose distance is less than its prize p(j). So, a client j will contribute for opening a facility i at iteration k, only if i is inside the ball of center j and radius min{d(j, Fa

k), p(j)}.

The proof of (2.31) is by induction on k.

Base: In the beginning, equation (2.31) holds because Fa

(40)

Induction Hypothesis: Assume that after the first k − 1 clients were served,

equa-tion (2.31) with k replaced by k − 1 holds, i.e.

X i∈Fa k−1 f(i) = X j∈Dk−1 (αj − d(j, Fka−1))+= k_X−1 l=1 (αl− d(jl, Fka−1))+ .

Step: We show that equation (2.31) holds after the k-th client is served by the

algorithm.

Since a client is served according to case (a), (b) or (c) of the algorithm, we analyze each one of them. First, we assume that case (c) happened. In this case we have that client

jk is not connected and no new facility is opened. Since αk = p(jk) and d(jk, Fka) ≥ p(jk),

neither the left-hand side, nor the right-hand side of equation (2.31) changes, and the result follows from the induction hypothesis.

Suppose now that one of the other cases occurs and the algorithm connects jk to the

facility ik when serving it. Notice that ik may already be open. Assume that case (a)

happened. In this case we have that ik∈ Fka−1 and, hence, Fka= Fka−1. So:

X i∈Fa k f(i) = X i∈Fa k−1 f(i) = X j∈Dk−1 (αj − d(j, Fka−1))+ = X j∈Dk (αj − d(j, Fka))+ , (2.32)

where the first equality holds because Fa

k = Fka−1, the second equality follows from the

induction hypothesis, and the last equality holds because Fa

k = Fka−1 and αk = d(jk, Fka).

Now suppose that case (b) happened. In this case a new facility ik is opened and we

have: f(ik) = (αk− d(jk, ik)) + X j∈Dk−1 min{d(j, Fa k−1), p(j)} − d(j, ik) + . (2.33)

(41)

2.2. Online Prize-Collecting Facility Location problem 23 To see that (2.31) holds, note that:

X i∈Fa k f(i) = X i∈Fa k−1 f(i) + f(ik) = X j∈Dk−1 (αj − d(j, Fka−1))++ f(ik) = X j∈Dk−1 (αj − d(j, Fka−1))++ (αk− d(jk, ik)) + X j∈Dk−1 (min{d(j, Fa k−1), p(j)} − d(j, ik))+ = X j∈Dk−1 (αj − d(j, Fka−1))++ (min{d(j, Fka−1), p(j)} − d(j, ik))+ + (αk− d(jk, ik)) = X j∈Dk−1 (αj − d(j, Fka))++ (αk− d(jk, ik))+ = X j∈Dk−1 (αj − d(j, Fka))++ (αk− d(jk, Fka))+ = X j∈Dk (αj − d(j, Fka))+ , (2.34)

where the second equality follows from the induction hypothesis, the third equality fol-lows from the condition in case (b) of the algorithm, and the fifth equality folfol-lows from Lemma 2.2.2.

This concludes the proof of (2.31). Therefore, we have thatPi∈Fa

nf(i) = P j∈Dn(αj− d(j, Fa n))+ ≤ P

j∈Dnαj for any set of clients Dn. So, ALGOPFL(Dn) ≤ 2 · P

j∈Dnαj. The next lemma shows that αk is bounded by the length of any path that goes from jk to any facility in Fa

k−1.

Lemma 2.2.4. Let je and jk be two clients such that je arrived earlier than jk, i.e., e < k. Let Fa

k−1 be the set of facilities opened just before jk arrived. Then, for every node x ∈ V

we have that:

αk ≤ d(je, F_ka₋₁) + d(je, x) + d(jk, x) .

We omit this proof because it is identical to the proof of lemma 2.1.4.

The next lemma bounds the cost of the dual variables α of the connected clients.

Lemma 2.2.5. Let Dc _{⊆ D}c

n be any subset of connected clients, and i ∈ F be any place in which a facility may be opened. Then:

f(i) 2 ≥ X j∈Dc αj 2H|Dn| − d(j, i) ! .

(42)

Proof. Before proving this lemma, let us define some notation. We denote the clients in Dc by {j

[1], j[2], . . . , j[nc]}, where nc = |D

c_{|. The index of clients respect the direct order}

of their arrivals, namely j[1] is the earliest client in Dc to arrive, and j[nc] is the latest. Notice that jk is different from j[k], since the former is the k-th client to arrive, and

the latter is the k-th client among those in Dc. Note that the clients j

[k] and j[k+1] may

not be consecutive, since other clients that are not in Dc may have arrived between their

arrivals. Furthermore, for each j[k] in Dc let [Dc]k = {j[1], . . . , j[k]} be the set of the first k clients to be added to Dc.

Consider the behavior of the algorithm when client j[k] arrives. Due to cases (a) and

(b) of the algorithm, for any i ∈ F we have:

f(i) ≥ α[k]− d(j[k], i) + X j∈D[k]−1 (min{d(j, Fa [k]−1), p(j)} − d(j, i))+ ≥ α[k]− d(j[k], i) + X j∈[Dc_]_k −1 (min{d(j, Fa [k]−1), p(j)} − d(j, i))+ = α[k]− d(j[k], i) + X j∈[Dc_] k−1 (d(j, Fa [k]−1) − d(j, i))+ ≥ α[k]− d(j[k], i) + X j∈[Dc]_k −1 (α[k]− d(j[k], i) − 2d(j, i)) = k(α[k]− d(j[k], i)) − 2 X j∈[Dc_] k−1 d(j, i) , (2.35)

where the second inequality follows because [Dc]

k−1 ⊆ D[k]−1, and the last inequality

follows from Lemma 2.2.4.

This implies a lower bound for the cost of opening any facility. Dividing the previous inequality by k, and summing up the two sides for every client in Dc, we have:

nc X k=1 f(i) k ≥ nc X k=1 (α[k]− d(j[k], i)) − 2 nc X k=1 X j∈[Dc]_k −1 d(j, i) k . (2.36) Notice that: nc X k=1 X j∈[Dc_] k−1 d(j, i) k = nc X k=1 k_X−1 l=1 d(j[l], i) k =nXc−1 l=1 nc X k=l+1 d(j[l], i) k =nXc−1 l=1 d(j[l], i) nc X k=1 1 k − l X k=1 1 k ! =Xnc l=1 d(j[l], i) (Hnc− Hl) . (2.37)

(43)

Also we have that: _n

c X

k=1 f(i)

k = Hncf(i) . (2.38)

Using equations (2.37) and (2.38) together with inequality (2.36), we have that:

Hncf(i) ≥ nc X k=1 (α[k]− d(j[k], i)) − 2 nc X k=1 X j∈[Dc]_k −1 d(j, i) k = Xnc k=1 (α[k]− d(j[k], i)) − 2 nc X l=1 d(j[l], i)(Hnc− Hl) = Xnc k=1 α[k]− d(j[k], i) − 2d(j[k], i)(Hnc − Hk) = Xnc k=1 (α[k]− 2Hncd(j[k], i)) + nc X k=1 (2Hk− 1)d(j[k], i) ≥ nc X k=1 (α[k]− 2Hncd(j[k], i)) = X j∈Dc (αj− 2Hncd(j, i)) . (2.39)

Dividing the two sides of this inequality by 2H_|Dn|, we have that:

f(i) 2 ≥ Hncf(i) 2H|Dn| ≥ X j∈Dc αj 2H_|Dn| −Hncd(j, i) H_|Dn| ! ≥ X j∈Dc αj 2H_|Dn| − d(j, i) ! , (2.40)

which concludes the proof.

Let α be the dual variables of a solution produced by the algorithm. For a dual solution to be feasible, its variables must respect all the dual constraints. The compact version of OPFL dual formulation has two sets of constraints. One is αj ≤ p(j) for any j

in D. Due to case (c) of the algorithm, we have that α respect these constraints.

The other set of constraints is Pj∈D(αj − d(j, i))+ ≤ f(i), for any i in F . Let αc be

the dual variables corresponding to clients that are in Dc

n. Lemma 2.2.5 shows that if we

divide αc by 2H

n, the rescaled variables respect the dual constraints.

Thus, we know how to make the dual variables αj feasible for each client j in Dnc. It

remains open how to show this for the clients in Dp

n. That is what we do in the following

(44)

Theorem 2.2.2. Throughout the execution of the algorithm, the following inequality

holds: f(i) 3 ≥ X j∈Dp αj 3H_|Dn| − d(j, i) ! ,

for any i in F in which a facility may be opened, and any subset of penalty clients Dp _⊆ Dp

n.

Before proving the above theorem, let us define some notation and prove some auxiliary lemmas. We denote the clients in Dp by {j

[1], j[2], . . . , j[np]}, where np = |D

p_{|. The index}

of clients respect the direct order of their arrivals, namely j[1] is the earliest client in Dp to

arrive, and j[np] is the latest. Furthermore, for each j[k] ∈ D

p define Dp

[k] = {j[1], . . . , j[k]}

to be the set of the first k clients to be added to Dp. Also, define ¯_Dp

[k]= {j ∈ Dp[k]: αj ≥ d(j, Fa

[k])} and ˜D[k]p = D[k]p \ ¯D[k]p .

Lemma 2.2.6. For k = 1, . . . , np we have ¯Dp_[k−1] ⊆ ¯Dp_[k].

Proof. Since j[k] arrives later than j[k−1], then F[k−1]a ⊆ F[k]a. Thus we have d(j, F[k]a) ≤

d(j, Fa

[k−1]) for any fixed j. Since j ∈ ¯Dp[k−1], we have αj ≥ d(j, F_[k−1]a ) ≥ d(j, F_[k]a ). Thus j _{∈ ¯}D_[k]p .

Lemma 2.2.7. For any k = 1, . . . , np, the statements ¯Dp_[k] = ¯D_[k]−1p and Fa

[k] = F[k]−1a

always hold.

Proof. Note that at iteration [k] the client j[k] is added to D_[k]p . This implies that, at

iteration [k], occurred case (c) of the algorithm and no facility was opened. So Fa

[k] = F[k]−1a

and α[k] < d(j[k], F[k]a). Thus, clients in ˜Dp[k]−1 do not move to ¯D[k]p , and j[k] is added to

˜

Dp_[k], i.e. ˜D_[k]p = ˜D_[k]−1p ∪ {j[k]}.

Notice that the similar statement ¯Dp_[k]= ¯Dp_[k−1]may not be true, because there may be

clients not in Dp that arrived between j

[k−1] and j[k] that were responsible for the opening

of some facilities, and the opening of these might have moved clients from ˜Dp_[k−1] to ¯D_[k]p .

Lemma 2.2.8. For any i ∈ F and any k = 1, . . . , np, the following inequality holds throughout the execution of the algorithm:

f(i) ≥ (α[k]− d(j[k], i))++ X j∈Dp_[k]−1 min{d(j, Fa [k]), αj} − d(j, i) + .

Proof. First notice that this inequality holds for all i ∈ F closed upon the arrival of j[k] ∈ Dp, from criteria (a) and (b) of the algorithm, lemma 2.2.1, and the fact that

αj = p(j) for any j in Dp. It actually also holds for the facility i ∈ F that is already

opened upon the arrival of j[k]. This is because α[k] ≤ d(j[k], i), from criterion (a), and

d(j, Fa

[k]) ≤ d(j, i) for all j in Dp[k], due to the fact that i ∈ F[k]a. Thus, we know that the

Online facility location and Steiner problems = Problemas online de localização de instalações e de Steiner

M´ario C´esar San Felice

“Online Facility Location and Steiner Problems”

“Problemas Online de Localiza¸c˜ao de Instala¸c˜oes e

de Steiner”

CAMPINAS

2015

Online Facility Location and Steiner Problems

M´ario C´esar San Felice

Abstract

Resumo

Acknowledgements

Contents

Chapter 1

Introduction

1.1 Online Computation

1.2 Competitive Analysis

1.3 Facility Location problems

1.3.1 Online Facility Location problem

1.3.2 Online Prize-Collecting Facility Location problem

1.4 Steiner problems

1.4.1 Online Steiner Tree problem

1.4.2 Online Single-Source Rent-or-Buy problem

1.4.3 Online Steiner Tree Star problem

1.5 Online Connected Facility Location problem

1.6 Notation and Definitions

Chapter 2

Facility Location problems

2.1 Online Facility Location problem

2.1.1 Linear Programming Relaxation and Its Dual

2.1.2 OFL Algorithm

2.1.3 Competitive Analysis of the Algorithm

2.2 Online Prize-Collecting Facility Location

prob-lem

2.2.1 Linear Programming Relaxation and Its Dual

2.2.2 OPFL Algorithm

2.2.3 Competitive Analysis of the Algorithm