Distributed Inference in Sensor Networks

(1)

Distributed Inference in Sensor Networks

João Manuel Melo Silva Gueifão

Thesis to obtain the Master of Science Degree in

Electrical and Computer Engineering

Supervisor: Professor João Manuel de Freitas Xavier

Examination Committee

Chairperson: Professor João Fernando Cardoso Silva Sequeira Supervisor: Professor João Manuel de Freitas Xavier

Member of the Committee: Professor Rodrigo Martins de Matos Ventura

December 2015

(2)

(3)

Dedicated to my parents, which enabled me to achieve what I am now

(4)

(5)

Resumo

Consideramos o problema que consiste em encontrar o caminho, com um determinado comprimento, que acumula a maior retribuiç ão num grafo cujos n ós cont êm pr émios.

Este problema surge no t ópico da estimaç ão distribu´ıda de par âmetros e consiste em encontrar o caminho mais informativo, com um certo comprimento, dentro duma rede de sensores.

Concebemos v ários algoritmos que estabelecem v ários compromissos entre complexidade com- putacional e qualidade das soluç ões obtidas. De entre as v árias abordagens tomadas, destacamos duas: (i) uma formulaç ão de um programa linear inteiro que permite a aplicaç ão de qualquer solu- cionador de programas daquele tipo de modo a resolver do problema que temos em m ãos; (ii) um m étodo sub- óptimo no contexto da programaç ão din âmica que replica a performance obtida em (i) mas com um tempo de execuç ão previs´ıvel e com uma muito maior facilidade de implementaç ão.

Discutimos tamb ém a extens ão dos nossos m étodos ao cen ário em que os sensores cont êm pr émios pertencentes a v árias categorias.

Palavras-chave:

redes de sensores, selecç ão de sensores, optimizaç ão, programaç ão din âmica

(6)

(7)

Abstract

We consider the problem of finding the path, with given length, that accumulates the largest reward in a graph in which nodes hold prizes. This problem arises in distributed parameter estimation and corresponds to finding the most informative path, with given length, through a sensor network.

We conceive a variety of algorithms that strike different trade-offs between computational complexity and solution quality. Among the variety of approaches we develop, two stand up: an integer linear program (ILP) formulation that allows the application of any general-purpose ILP solver to solve the problem at hand, and a suboptimal dynamic programming-based method that replicates the ILP performance but has predictable running time and it is far easier to implement.

We also discuss the extension of our methods to the scenario in which nodes own prizes from several categories.

Keywords:

sensor networks, sensor selection, optimization, dynamic programming

(8)

(9)

List of Figures

1.1 A graph with prizes across nodes. Prizes were generated from an uniform distribution

(5,15)except for the jackpot nodes which were further multiplied by100. . . 2

2.1 Running time of the brute-force method versus path length. . . 9

2.2 Prize accumulated by the brute-force method versus path length. . . 9

3.1 Toy graph . . . 12

3.2 A graphG= (V, E)withV ={v1, v2, v3, v4}andE={e1, e2, e3, e4, e5, e6}. . . 13

3.3 The augmented graphG_f= (V_f, E_f)of the graph in figure 3.2. . . 14

3.4 The flow in (3.4) bears two disjoint paths:s→v₁→tandv₃→v₄→v₃. . . 15

3.5 The flow in (3.5) is a circulation. . . 16

3.6 Partial construction of the STG corresponding to the graph in figure 3.2: arce3 appears replicatedK−1times. We considerK= 3layers. . . 17

3.7 Partial construction of the STG corresponding to the graph in figure 3.2: arcse₁, e₂, e₄ ande5are not replicated to avoid overburdening the picture . We considerK= 3layers. 18 3.8 Running time of SCIP and brute-force search versus path length. . . 21

3.9 Running time of SCIP on a graph with100nodes. . . 22

3.10 Prize accumulated by SCIP on a graph of 100 nodes versus path length. . . 22

4.1 Prize accumulated by both the brute-force method and the linear relaxation versus path length. . . 26

4.2 Small graph with prizes next to nodes. . . 27

4.3 Zigzag behavior of a fractional solution to the graph in figure 4.2. We omitted the nodes related tov6andv7to focus on the zigzag phenomenon. . . 28

5.1 Running time of the brute-force search and the iterative hardening method versus path length. . . 30

5.2 Prize accumulated by both the brute-force search and the iterative hardening method versus path length. . . 31

5.3 Running time of the brute-force search and the extended version of the iterative hardening method versus path length. . . 32

(12)

5.4 Prize accumulated by both the brute-force search and the extended version of the iterative hardening method versus path length. . . 33 5.5 Running time of RGS versus path length, for several depths of recursion. . . 36 5.6 Normalized prize accumulated by RGS versus maximum recursion depth, for path lengths

fromK= 1toK= 20. . . 36 5.7 Prize accumulated by both the brute-force search and the RGS method versus path

length, for recursion levelsi= 2andi= 3. . . 37 5.8 Running time of RGS versus path length, for several depths of recursion. . . 37 5.9 Normalized prize accumulated by RGS versus maximum recursion depth, for several path

lengths. . . 38 5.10 Prize accumulated by both the brute-force search and the RGS method versus path

length, for recursion levelsi= 2andi= 3. . . 38 5.11 Running time of FRGS versus path length, for recursion levelsi= 1andi= 2. . . 39 5.12 Prize accumulated by both the brute-force search and the FRGS method versus path

length, for recursion levelsi= 1andi= 2. . . 40

(13)

Chapter 1

Introduction

1.1 The prize accumulation problem

Motivation. Wireless sensor networks assess the physical state of an environment; they consist of measurement units—the sensors—scattered over a region of interest and linked via wireless. Each sensor is small, cheap and battery-operated, and the measurement accuracy of each unit varies from sensor to sensor.

As soon as sensors take measurements, their readings must be processed to squeeze out the rele- vant environment’s parameter, e.g., temperature, humidity or soil pollution.

When tracking physical phenomena with fast dynamics, we cannot profit fromall network sensors—

the time needed to collect the entire network observations and to do the lengthy number-crunching would render the estimate obsolete. In those critical scenarios we must estimate the parameter using few sensors. The obligation of activating only a small subset of sensors also holds for slow-changing phenomena that are sampled scarcely (to save network energy) and that demand quick estimators.

Token-based estimators. The limited onboard battery of each sensor restricts its wireless commu- nication range. As a result, each sensor can reach only a small number of neighbors and this feature favors estimators that are token-based (Kar and Moura [2011], Johansson et al. [2010]): a data-carrying token starts at a given sensor and proceeds to amass observations by hopping from sensor to sensor;

after a fixed number of jumps, the token ends its journey and the parameter is estimated based on the information available in the token. This is a distributed inference method since no fusion center is involved.

In this approach the whole network is not switched on, just the sensors that the token visits. We control the time required to form an estimate by selecting thelengthof the token’s path, defined as the number of visited nodes including multiplicities. For example, if the path starts at node 4, jumps to 7, jumps to 4, jumps to 9 and does a final jump to 17, then this path’s length is five; in this example, node 4 is visited twice (although it contributes only one measurement; on the second visit, node 4 acts as a jumping point for the token to reach node 9). To simplify notation, we will refer to such a path by

(14)

4→7→4→9→17.

Most informative path. To design a token-based estimator we must specify the path, i.e., the se- quence of visited nodes—we assume here that the path’s length is given. All paths with the same length are not equivalent as different sensors exhibit different measurement accuracies. We seek the most informative path, the one passing through the most accurate sensors.

Problem statement. We model the sensor network as a graphG= (V, E)whereV is the set of nodes andEis the set of edges. Each node represents a sensor and each edge represents a wireless channel between sensors. The set of nodes is denotedV ={v₁, v₂, . . . , v_n}and each node is endowed with a number, say,pifor nodevi; this number quantifies the worth of the sensor’s reading (sensors with better measurement accuracies have higherpi).

It helps to think of thepi’s as prizes available to be collected—only once—at the nodes.

Figure 1.1 displays a graph with 20 nodes in which the size of each node is made proportional to its prize. The two extreme nodes contain prizes much larger than the average, which are called by us as jackpots.

8.387

7.62

5.592

11.66

7.26

1243 8.483

8.119

10.57

12.38 14.31

8.205

9.007

10.3 12.96

14.44

1044

8.467 11.94

13.38

Figure 1.1: A graph with prizes across nodes. Prizes were generated from an uniform distribution(5,15) except for the jackpot nodes which were further multiplied by100.

Assume the given path’s length isK. In this thesis, we focus on the following Prize Accumulation Problem (PAP):find a path with lengthKthat maximizes the sum of accumulated prizes.

Note the importance of the graph structure on the problem statement; if there were no graph (equivalently, the graph was fully connected, i.e., each node could reach any other node) the problem would have a trivial solution: choose the nodes with the topKprizes. For a generic graph, this choice is usually

(15)

infeasible because the nodes with best prizes cannot be visited in a path with lengthK. For example, if K= 2, the two jackpot nodes in the graph of Figure 1.1 cannot be visited in a path of lengthK= 2.

1.2 Application example

We illustrate a popular instance of the prize accumulation problem. The parameter of interest is denoted byθ∈Rand sensorireads it embedded in gaussian noise:

yi=θ+ηi,

whereyiis the observation available at sensoriandηi∼ N(0, σ_i²)stands for a gaussian random variable with zero mean and varianceσ²_i, i.e., with probability density function

p(η_i) = 1

√2πσ_ie⁻

1 2

ηi2 σi2.

The variance sets the sensor accuracy: the smaller the noise strengthσ_i², the better the sensor accuracy.

Suppose we activateKsensors. To simplify notation, we assume that we activate sensors1toKand that those sensors constitute a feasible path in the graph. The observation vectory:= (y1, y2, . . . , yK)∈R^K is given by

y=θ1_K+η

where1K:= (1,1, . . . ,1)andη:= (η1, η2, . . . , ηK).

Assuming the measurements are independent, the maximum-likelihood (ML) estimator (executed at the end of the token’s path) is

θb_ML = argmax

θ

log p(y; θ)

= argmin

θ K

X

i=1

(yi−θ)² 2σi2

= PK

i=1 yi

σi2

PK i=1

1 σ_i²

.

It is straightforward to show that this estimator is unbiased and its variance is var

θb_ML

= E

θ−θb_ML²

= 1

PK i=1

1 σ_i²

.

We see that the precision of the estimator, i.e., the inverse of its variance, is given by the sum of the

(16)

precisions of the sensors involved in the path:

1 var

MLc =

K

X

i=1

p_i

with

pi:= 1 σ_i².

This conclusion applies to any other path of length K. Thus, in order to reduce the variance of the estimator, we must select the path of lengthKthat maximizes the sum of the visitedp_i’s—an optimization challenge that matches our problem formulation.

1.3 Related literature and thesis contributions

The research literature addresses several flavors of the generic problem of selectingK most informative sensors.

Works close to ours are Lee and Dooly [1996], Lee and Dooly [1998], Joshi and Boyd [2009], Ya- mamoto et al. [2009] and Bian et al. [2006]. They differ from our problem statement in that either the merit of each subset ofKnodes cannot be cast as a sum of node prizes, or the set ofKnodes are only required to form a connected subgraph—a path always generate a connected subgraph, but a general connected subgraph withKnodes cannot always be generated by a path withKnodes. The latter flavor is known as the Maximum Weight Connected Graph Problem.

Our problem statement falls into the scope of the work in Chekuri and Pal [2005] which proposes an heuristic to optimize a submodular function over the paths ofKnodes.

Contributions. In this thesis we propose several approaches to tackle our problem statement. The algorithms we develop apply tools from linear programming (LP) and dynamic programming and offer different trade-offs in terms of solution sub-optimality and computational complexity.

The key finding is that our idea of combining the rollout framework developed in Bertsekas et al.

[1997], Bertsekas and Castanon [1999] and Bertsekas [2013] with the heuristic in Chekuri and Pal [2005] gives rise to a powerful solution method that matches the performance of state of art integer linear program (ILP) solvers packed with a profusion of sophisticated branch-and-bound shortcuts. Our approach is far simpler to implement (a modest number of lines in Matlab) and, contrary to the ILP solver, has a predictable running time.

1.4 Thesis organization

In chapter 2, we develop a brute-force method that solves the prize accumulation problem. We use it to benchmark the performance of the remaining algorithms that we create throughout the thesis.

(17)

In chapter 3, we present one of our major contributions: an integer linear program (ILP) formulation of the prize accumulation problem. This allows to solve the prize accumulation problem through any general-purpose ILP solver. Chapter 4 examines the natural relaxation of the ILP formulation.

In chapter 5, we produce several greedy algorithms. Section 5.5 contains our main contribution:

the fortified recursive greedy search. This is a simple method that competes with the sophisticated ILP solver.

Chapter 6 discusses an extension of the prize accumulation problem to several prize categories.

Chapter 7 draws the main conclusions.

Finally, we test our algorithms on several random graphs: appendix A explains how those are con- structed.

(18)

(19)

Chapter 2

Brute-force solution

2.1 Chapter organization

In this chapter, we propose an algorithm that solves the prize accumulation problem via exhaustive enumeration, i.e., we carry a brute-force search over the set of all paths of lengthK. The computation time of this approach is clearly infeasible for medium to largeK. However, the method sets an useful benchmark against which we can compare the additional algorithms that we develop throughout the thesis, both in terms of computational time and solution quality.

We present the algorithm in section 2.2 and some numerical results in section 2.3. Section 2.4 looks at the computational complexity of the method.

2.2 A brute-force algorithm

Generating a path of lengthK consists in choosingK nodes—vi ∈ V fori = 1, . . . , K—such that v1→v2→ · · · →vKis a valid path in the graph.

Thus, we can view this problem as a genericAssignment ProblemwithKvariables: we must assign a node inV to each of theKvariablesv₁, v₂, . . . , v_K; thek-th variable is the node visited at thek-th step of the path.

We propose an algorithm based on the Constraint-Satisfaction-Problem (CSP) Solver using Back- tracking Search, presented in Russell and Norvig, 2009 and adapted to theGeneral-Assignment-Problem.

The algorithm is presented as Algorithm 2.1 and given in a generic form, as the reader may find it useful in other contexts.

In terms of the prize accumulation problem, we make the following identifications: anassignment consists of an ordered set of nodes (i.e., a path); avariabledefines what node is visited atk-th position (i.e., the visited node atk-th instant on the path), and belongs to an assignment;constraintsdefines the structure of the graph (an graph adjacency matrix, for example);constantsdefines the prizes existent at each of the nodes. Also, the auxiliary methods used in the algorithm, are as follows:

Complete tests if a provided assignment is already complete, i.e., all the variables that compose that

(20)

assignment have a value assigned to them. In our case, test if the path has already the de- sired/maximal length.

MeritFunction An assignment, when complete, has a corresponding merit in terms of the provided constants that differentiates it from all other possible variable assignments.MeritFunctioncomputes that merit. In our case, it computes the prize-sum related to the found path at the moment.

SelectUnassignedVar gets the first variable in the provided assignment that is not yet defined, i.e., does not have yet a value binded to it. Also, provides the sub-domain for that same variable, i.e., the values that the variable can now assume while still respecting the constraints that define the problem for the values already assigned for all other variables. In our case, it dictates the subset of nodes that can be visited at the next time-instantk, having defined what node was visited at instantk−1.

Add Assigns the provided value to the variable belonging to the provided assignment. In our case, it defines the node to be visited at time-instantk.

Algorithm 2.1Recursive Optimal Assignment Search

procedureRECURSIVESEARCH(assignment,constraints,constants)

ifCOMPLETE(assigment)then .An assignment is complete if all of its variables are defined.

merit←MERITFUNCTION(assignment, constants) return{assignment,merit}

else

{var, varDomain} ←SELECTUNASSIGNEDVAR(assignment, constraints) bestSolution←N aN

bestM erit←0

for allvalue∈varDomaindo

assignment_i←ADD(assignment, var, value)

{solutioni, meriti} ←RECURSIVESEARCH(assignmenti, constraints, constants) ifmeriti> bestM eritthen

bestM erit←meriti

bestSolution←solutioni

end if end for

return{bestSolution, bestM erit}

end if end procedure

2.3 Numerical results

We tested our algorithm on the graph with20nodes from figure 1.1. Recall that the graph has two jackpots.

In figure 2.1 we show how the running time varies with the path lengthK; we clearly see the exponential growth.

In figure 2.2 we plot the prize accumulated as a function of the path length. Note the sudden jump in the accumulated prize as K changes from 10 to 11—a consequence of the fact that the distance between the two jackpots is11.

(21)

0 5 10 15 20 25 10⁻⁶

10⁻⁴ 10⁻² 10⁰ 10² 10⁴ 10⁶

Optimal Brute Force Search CPU Time versus Path Length

Path Length [node]

Computation Time [second]

Figure 2.1: Running time of the brute-force method versus path length.

0 5 10 15 20 25

1200 1400 1600 1800 2000 2200 2400 2600

Optimal Brute Force Total Collected Prize versus Path Length

Path Length [node]

Absolute Total Collected Prize

Figure 2.2: Prize accumulated by the brute-force method versus path length.

2.4 Complexity

We now show that the computational complexity of the algorithm is exponential in the mean degree of the graphG= (V, E). More precisely, the complexity is

O

|V| µ^K µ−1

whereµis the mean degree¹of the graph andKis the path length. The mean degree ofGis defined as

µ := E(|Nv|)

= 1

|V| X

v∈V

|Nv|

1The degree of a node is the number of neighbors of that node.

(22)

whereNv is the set of neighbors ofvin the graph.

We start by noting that each instance of the algorithm recursive procedure calls itselfµtimes. The maximum value foriis determined by the length of the path to be found, i.e.,i∈ {1, . . . , K}. That means that we will haveKlevels of recursion. We can easily see that the total ofrecursion-callswill be

C=|V|+|V|µ+|V|µ²+· · ·+|V|µ^K

which can be compactly represented as

C = |V|

K

X

i=0

µⁱ

= |V|µ^K+1−1 µ−1 .

Sinceµ^K 1, we approximate

C≈ |V|µ^K+1 µ−1.

(23)

Chapter 3

Formulation as an integer linear program

3.1 Chapter organization

We show that the Prize Accumulation Problem (PAP) can be formulated as an Integer Linear Program (ILP). In the next chapter we make use of this formulation to develop a sub-optimal algorithm.

To reach the ILP formulation, we need to introduce the concept of space-time graph (STG)—an operation that produces a large graph by creating several time folds of a given one. We introduce STGs in section 3.2 and exploit them to devise our ILP formulation in section 3.3. In section 3.4 we present numerical results comparing the ILP formulation with the brute-force algorithm.

3.2 Space-time graph

The Prize Accumulation Problem (PAP) seeks the path of lengthK that reaps the largest sum of prizes. This problem statement evokes a search over a discrete domain—the set of all paths of length Kin a given graphG= (V, E). To escape the combinatorial nature of the problem we need to represent paths in an optimization-friendlier setting. The concept of network flows on STGs provides a suitable new mathematical home for our optimization variable.

3.2.1 How to represent a path?

A path in a given graph, say,12→ 7→ 9 →20, can be thought of as the route traced by one unit of water that enters the graph at node 12and leaves it at node20. In this example, we visualize the water entering the graph at node 12, flowing through the edge12 →7, then through the edge7 → 9, and finally through the edge9→20, after which it leaves the graph. The edges act as a water plumbing system; a path corresponds to a particular journey through the pipelines.

(24)

Network flows. The example generalizes, i.e., we can express any path via anetwork flow. By definition, a network flow is any feasible distribution of water across the edges of a network¹ —by feasible distribution we mean a distribution that conserves water at each node like Kirchoff’s law for currents in electrical networks.

As we want to consider general entry and exit nodes (i.e, not fixeda priori) we need to work with an augmented graph. We now detail the procedure.

Augmented graph. LetG= (V, E)be the given graph. We start by creating two extra nodes: a source nodesand a sink nodet. Those nodes represent the virtual source and virtual sink of the unit of water that will move aroundG. We then introduce new arcs²fromsto every node of G, and from those tot.

We denote the new graph byG_f = (V_f, E_f), where thef subscript stands forflow. Note thatV_f has two more nodes thanV andE_fhas a total of(2|V|+|E|)arcs.

Letx_f∈R^|E^f^|be a vector that represents a flow: the entryx_f_j represents the amount of flow through thejth arc of G_f. To support a feasible distribution of water, the vectorx_fcannot be arbitrary: it must obey the flow conservation laws. These are encoded in the linear system

A_fx_f=b_f (3.1)

whereA_fis thenode-arc incidence matrix associated toG_f andb_f is thesupply-demand vector. Rowj ofA_fand the correspondingj-th entry ofb_fdefine the flow conservation at nodej; each row ofA_fsums out all variable flows (represented byx_f) entering or exiting the associated node, andb_f represents the corresponding external supply or extraction (demand). If thej-th entry ofb_fequals zero then nodejis a transportation node (neither receives nor leaks water from and to the exterior); the sum of flow through the incoming edges matches the sum of flow through the outgoing edges.

In sum, a nonnegative vectorx_fis called a network flow if it obeys (3.1).

v ₂ v ₁ e ₂

e ₁

3 1

e ₃

Figure 3.1: Toy graph

1Network is a synonym for graph.

2An arc is sometimes defined in the literature as being a directed edge, or a ordered pair of nodes. We use both definitions interchangeably.

(25)

Example of a flow conservation system. Consider the graph in Figure 3.1. Here, three units of water are injected at node1and one unit is extracted from node2. The flow conservation system is





−1 1 0 0 −1 1





| {z }

Af





 x_f₁ x_f₂ x_f₃







| {z }

x_f

=



 3

−1





| {z }

b_f

.

Note thatx_f_j is the amount of water traversing edgee_j. The first row of the equation defines the flow balance at nodev₁,

−x_f₁+x_f₂ = 3,

and the second row the flow balance at nodev2,

−x_f₂+x_f₃ =−1.

v ₁

v ₃

v ₄ v ₂

e ₂ e ₁ e ₄ e ₃

e ₆ e ₅

Figure 3.2: A graphG= (V, E)withV ={v1, v2, v3, v4}andE={e1, e2, e3, e4, e5, e6}.

Example of an augmented graph. LetGbe the graph in figure 3.2. The node-arc incidence matrixA and supply-demand vectorbof the graphGare:

A=







1 −1 1 −1 0 0

−1 1 0 0 0 0

0 0 −1 1 1 −1

0 0 0 0 −1 1







, b=





 0 0 0 0







₄

1 t





1^T_|V_| 0^T_|E| 0^T_|V_|

−I_|V_| A I_|V_| 0^T_|V_| 0^T_|E| −1^T_|V_|







=







+1 +1 +1 +1 0 0 0 0 0 0 0 0 0 0

−1 0 0 0

0 −1 0 0

0 0 −1 0

0 0 0 −1







+1 −1 +1 −1 0 0

−1 +1 0 0 0 0

0 0 −1 +1 1 −1

0 0 0 0 −1 +1







+1 0 0 0

0 +1 0 0

0 0 +1 0

0 0 0 +1

0 0 0 0 0 0 0 0 0 0 −1 −1 −1 −1







and

b_f=

1, 0, 0, 0, 0, −1

. (3.3)

The catches of representing paths via network flows. The notion of flows pervades network theory and it is only natural to use flows to encode paths. However, flows express much more than paths, i.e., not all flows arise from paths. Flows are richer objects. They can expressfractional paths, support disconnected paths and hostcirculations. We illustrate those phenomenons via the graph in figure 3.2 and its augmented counterpart in figure 3.3.

1. Flows can expressfractional paths. Two possible solutions (among infinitely many others) for the

(27)

unknownxin the flow conservation systemA_fx=b_fare z_f=

0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0

and

x_f=

0.5, 0, 0.5, 0, 0.3, 0, 0.2, 0, 0.1, 0.1, 0, 0.3, 0.7, 0 ,

i.e., there holdsA_fz_f=b_fandA_fx_f=b_f.

The two solutions are structurally distinct:z_fis abinarysolution whereasx_fis afractional solution.

That is, the entries ofz_f live in the discrete set{0,1} whereas the entries of x_f spread over the interval[0,1]. Throughout the thesis, binary solutions are labeled with a “z”; fractional ones are labeled with a “x”.

The edges of the graph indicated by the nonzero entries of z_f illuminate a path in the graph.

However, the solutionx_frepresents a water distribution that forks at some nodes; as such,x_fdoes not depict a legitimate path;

2. Flows can supportdisconnected paths.Another solution forA_fx=b_fis

x_f=

1, 0, 0, 0, 0, 0, 0, 0, 0.5, 0.5, 1, 0, 0, 0

. (3.4)

In figure 3.4, we visualize this solution in the sense that the edges corresponding to the nonzero entries ofx_femerge as thicker lines. We see thatx_fcarries two disjoint paths;

v1

v3

v4

v2

e2

e1

e4

e3

e6

e5

s

t

s1 s₃ s₂ s4

t₁ t3 t2

t4

1

Figure 3.4: The flow in (3.4) bears two disjoint paths:s→v1→tandv3→v4→v3. 3. Flows can host circulations. Yet another solution is

x_f=

0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0

. (3.5)

Figure 3.5 illustrates this solution by flagging its nonzero entries with thicker edges. We see that x_f represents a loop within the graph, known as a circulation. In terms of the plumbing system

(28)

v1

v3

v4

v2

e2

e1

e4

e3

e6

e5

s

t

s₁ s₃ s₂ s₄

t1 t3 t2

t4

1

Figure 3.5: The flow in (3.5) is a circulation.

analogy, we face an intriguing event: there is water inside the network that comes from nowhere. . .

Conclusion. The set of paths in a graph can be embedded in the set of flows of the corresponding augmented graph. However, since flows contains additional artifacts we cannot directly adopt the set of flows as our optimization domain.

In the next section, we mitigate this difficulty by using flows on the space-time graph (not on the augmented graph).

3.2.2 How to represent a path? Part 2

The space-time graph (STG) spans many instances ofG= (V, E)through time. If we are interested in paths of lengthK, the STG spansKcopies ofG.

Construction of the STG. We start by creatingKcopies of the set of nodes ofG:V. Each one of the Kreplicas is atemporal layerof the STG and is denoted bylk,k= 1, . . . , K. Finally, we link the temporal layers: for each arcvi →vjinG, we insert an arc connecting every pair of consecutive temporal layers l_kandl_k+1through nodesv_iatl_kandv_jatl_k+1, fork= 1, . . . , K−1. Consider the graphGin figure 3.2:

the arce₃points fromv₁tov₃; figure 3.6 shows the replication ofe₃in the STG (we considerK= 3).

Finally, we add a source nodesand a sink nodet, and join them to the first and last temporal layer, respectively.

Example. In figure 3.7, we show the STG corresponding to the graph in figure 3.2—we omit the repli- cations of arcse₁,e₂,e4ande₅, to avoid overloading the figure.

(29)

e3,0

v1,1

v3,1

v4,1

v2,1

l=2

v1,2

v3,2

v4,2

v2,2

l=3=K

e3,1

v1,0

v3,0

v4,0

v2,0

l=1

Figure 3.6: Partial construction of the STG corresponding to the graph in figure 3.2: arc e3 appears replicatedK−1times. We considerK= 3layers.

Conservation system. We denote the STG byG_tf= (V_tf, E_tf), wheretfsubscript stands fortemporal flow. A flow inG_tfis any nonnegative vectorx_tfthat obeys the conservation system

A_tfx_tf=b_tf,

whereA_tfis the node-arc incidence matrix of the STG and follows from the node-arc incidence matrixA of the graphGas follows:

A_tf=







1_|V_|^T 0_|E|^T 0_|E|^T · · · 0_|E|^T 0_|E|^T 0_|V_|^T

−I_|V_| A⁺ 0 · · · 0 0 0

0 A⁻ A⁺ · · · 0 0 0

0 0 A⁻ · · · 0 0 0

... ... ... . .. ... ... ...

0 0 0 · · · A⁻ A⁺ 0

0 0 0 · · · 0 A⁻ I_|V_| 0_|V_|^T 0_|E|^T 0_|E|^T · · · 0_|E|^T 0_|E|^T −1_|V_|^T







where1_|V_|and0_|V_|are column-vectors of|V|ones or zeros, respectively,I_|V_|is the|V|-by-|V|identity matrix. Also, A⁺ is a matrix with the same dimension of A that keeps only its positive entries; the remaining are set to zero. The matrixA⁻ follows the same definition but for the negative entries ofA.

The pattern-column with descending A⁺ and A⁻ entries appearsK −1 times: the same number of temporal layer interconnections inG_tf. For example, for the STG in figure 3.7,K= 3and the matrixA_tf

(30)

e3,0

v1,1

v3,1

v4,1

v2,1

l=2

v1,2

v3,2

v4,2

v2,2

l=3=K

e3,1

v1,0

v3,0

v4,0

v2,0

l=1

s

s₁ s₃ s₂ s₄ 1

t

t1

t3 t2

t4

1 e4,0

e4,1

Figure 3.7: Partial construction of the STG corresponding to the graph in figure 3.2: arcse1,e2,e4 and e5are not replicated to avoid overburdening the picture . We considerK= 3layers.

would be:

A_tf =







1_|V_|^T 0_|E|^T 0_|E|^T 0_|V_|^T

−I_|V_| A⁺ 0 0

0 A⁻ A⁺ 0

0 0 A⁻ I_|V_|

0_|V_|^T 0_|E|^T 0_|E|^T −1_|V_|^T







(3.6)

Note thatx_tfhas now a total of2|V|+ (K−1)|E|entries, corresponding to the same total of arcs in the STG.

Also,b_tfis the supply-demand for the STG, and we define it (as before) equal to+1at the first entry,

−1at the last one, and zero elsewhere. For the given example:

b_tf=

1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, −1

(3.7)

Properties of the STG. The STG of any graph is acyclic, i.e., it is loopless. As such, it prohibits the presence of circulations—one of the main drawbacks of augmented graphs. Indeed, in a STG, we have arcs only pointing downwards, from layer to layer. So, it is impossible to traverse a cycle within it, i.e., to start a path at any nodev, and end at that same nodev; we would need to get upwards among the

(31)

layers.

It follows that STGs cannot support disconnected paths. The flow balance equationA_tfx_tf=b_tfmakes any flow start at the source nodesand end at the nodet.

Finally, ifz_tfis a binary flow in a STG it is easy to recover the corresponding path in the original graph:

the trajectory (encoded inz_tf) of the unit of flow crossing the STG passes through exactly one node of each temporal layer; if we drop the time stamp of each such visited node, we trace a path in the original graph.

3.3 The prize accumulation problem as an integer linear program

Preliminaries: the matrix of visits. Let G= (V, E)be the given graph and supposex_f represents a path ofGin the augmented graphG_f, i.e.,x_fis a flow inG_f that contains no disconnected paths nor circulations and it is a binary vector: thejth entry ofx_fis1if arcj belongs to the path, and0otherwise.

The number of visits that the path embedded inx_fpays to a general nodevcan be obtained as

b^T_vx_f

whereb_vis an|E_f|-dimensional binary vector whosejth entry is

b_{v j} =







1, if arcjpoints into nodev 0, otherwise.

(3.8)

We define thematrix of visits as the|V|-by-|E_f|binary matrix that stacks all those vectors; itsv-th row isb^T_v:

B=





 bv₁T

bv2

T

... bv_|V_|T







. (3.9)

This matrix is an useful device to define the integer linear program formulation.

Integer linear program (ILP) formulation. We work in terms of the STG of the given graphG= (V, E).

The path lengthKdetermines the number of temporal layers of the STG.

Any flowx_tfin the STG starts at the sourcesand ends at the sinkt, crossing all layers in between.

To retrieve the number of visits that the path encoded inx_tfpays to a general node inG, we have to adapt the matrix of visitsBto the STG, as follows:

B_tf=h

I_|V_|×|V_| B B . . . B 0_|V_|×|V_| i

. (3.10)

We concatenate horizontally a|V|-by-|V|identity matrix, a replication of B byK−1 times, and finally a |V|-by-|V|matrix of zeros. The identity matrix accounts for the arcs of G_tf pointing from the source

(32)

node into every node in the first layer, and the zero matrix is due to the fact that every node in the last layer has an arc pointing into the sink node. The replication of B is needed because, when walking a path alongG_tf, a visit made tomreplications of a same nodev, in the different layers, counts asmvisits made to the same nodevinG, at different time instants.

We define the node-visit map

s(x_tf) =B_tfx_tf (3.11)

that returns the number of visits paid to each node inGby the path embedded in the flowx_tfinG_tf. Note thats(x_tf)∈ R^|V^|is indexed by the nodes ofG: forv ∈ V, the entrys_v(x_tf)indicates how many times nodevis visited by the path underlyingx_tf. Also, the maps(·)is linear.

Letp_v denote the prize available at nodev∈ G. We can formulate the prize accumulation problem as

maximize

x_tf

P

v∈V p_vmin{1, s_v(x_tf)}

subject to A_tfx_tf=b_tf

x_tfj∈ {0,1}, for allj= 1, . . . ,|E_tf|.

(3.12)

Each entry of the optimization variablex_tfis constrained to the binary alphabet{0,1}: we switch on or off the arcs of the STG. The conservation systemA_tfx_tf=b_tfguarantees that a valid path is formed. In the objective function we must insert themin{·}operator in order to collect at most once the node prizes.

Problem (3.12) is not an ILP since the objective is not linear in the optimization variable. However, we can cast it as an ILP by introducing epigraph variablest= (tv)_v∈V:

maximize

xtf,t

P

v∈V p_vt_v subject to tv ≤1, forv∈V

tv ≤sv(x_tf), forv∈V A_tfx_tf=b_tf

x_tfj ∈ {0,1}, forj= 1, . . . ,|E_tf|.

(3.13)

3.4 Numerical results

Problem (3.13) can be tackled by any ILP solver. Here, we report our experience with a solver named SCIP (Solving Constraint Integer Programs), which is one of the fastest non-commercial solvers of mixed integer programming (MIP) and mixed integer nonlinear programming (MINLP). SCIP is presented in Achterberg [2009].

SCIP capitalizes on constraint relaxation techniques and it is highly performant, as it is implemented in the C programming language. It combines the branch-and-bound technique with the revised Simplex algorithm. The branch-and-bound technique is executed by building a search tree over the problem space, and SCIP tries to avoid to search all space by using a series of heuristics in order to diminish the gap between lower and upper bounds of the objective function in the optimal feasible (integer) point.

The solver iteratively tightens the optimality gap to zero, thereby arriving at the solution.

(33)

Drawback. Although the SCIP finds the optimal solution of the prize accumulation problem, its running time is highly unpredictable. This is due to the intrinsic unpredictibality of the heuristics that the solver uses to manage the search tree: those heuristics depend strongly on the problem structure as well on the fine tuning of many solver parameters.

Numerical results. Despite this unpredictability, we find that SCIP solves the graph in figure 1.1 in less time than the brute-force search method, for high values of the path lengthK. Figure 3.8 compares the two approaches. It is interesting to note the critical point around the path length of 10. For any path length below that, we see that it is preferable to use the brute-force search instead. This might be due to the initial overhead of computations made by the SCIP framework (pre-solving procedures, instantiation of the search tree, etc.). For larger values of the path length, our ILP formulation permits to beat the brute-force method. Note that the prize accumulated by SCIP coincides with the one accumulated by the brute-force search, since both methods return the optimal solution, i.e., figure 2.2 applies to both SCIP and brute-force search.

0 5 10 15 20 25

10⁻⁶ 10⁻⁴ 10⁻² 10⁰ 10² 10⁴ 10⁶

[Linear Program − Branch & Bound Search]

Absolute CPU Time vs. Path Length

Path Length [node]

Absolute Computation Time [second]

LP − Optimal Branch & Bound Search Optimal Brute Force Search

Figure 3.8: Running time of SCIP and brute-force search versus path length.

Further numerical results. We also tested SCIP on a larger graph, with 100 nodes and maximum degree of 4 (that is, each node has four neighbors, at most). The running time is plotted in figure 3.9 and the figure suggests an exponential dependence on the path length. In figure 3.10 we plot the prize accumulated by SCIP as a function of the path lengthK. We distributed4jackpots across the nodes as the figure confirms: we note4jumps in the collected prize. AtK= 27, all jackpots were collected.

(34)

0 5 10 15 20 25 30 35 10⁻⁶

10⁻⁴ 10⁻² 10⁰ 10² 10⁴ 10⁶

Absolute CPU Time vs. Path Length

Path Length [node]

Absolute Computation Time [second]

LP − Optimal Branch & Bound Search

Figure 3.9: Running time of SCIP on a graph with100nodes.

0 5 10 15 20 25 30 35

1000 1500 2000 2500 3000 3500 4000 4500 5000

Absolute Total Collected Prize vs. Path Length

Path Length [node]

Absolute Merit

LP − Optimal Branch & Bound Search

Figure 3.10: Prize accumulated by SCIP on a graph of 100 nodes versus path length.

(35)

Chapter 4

Linear programming relaxation

4.1 Chapter organization

Since (3.13) is an integer linear program (ILP), it will be time-consuming to solve for large graphs. In the field of Optimization, a widespread idea for attacking ILPs consists in examining their linear relaxation; our analysis would be incomplete if we didn’t also explore the relaxation approach and such is the goal of this chapter.

Sections 4.2 and 4.3 describe the linear relaxation and the mechanism to retrieve paths from fractional solutions. Section 4.4 presents numerical results; we find that the popular relaxation technique is disappointingly fruitless for our problem and we expose the fundamental reason behind the lack of success.

4.2 The relaxation

The integer linear program (ILP) from (3.13) is hard to solve due to the binary constraints. We can obtain an approximation by relaxing the integrality constraints:

maximize

xtf,t

P

v∈V p_vt_v subject to tv ≤1, forv∈V

tv ≤sv(x_tf), forv∈V A_tfx_tf=b_tf

0≤x_tfj≤1, forj= 1, . . . ,|E_tf|.

(4.1)

Recall that the linear mapsv(x_tf)was defined in (3.11).

Problem (4.1) is a linear program (LP) and easy to solve numerically. The size of the variable is

|E_tf|+|V|, or, equivalently,3|V|+ (K−1)|E|, where|V|is the number of nodes in the original graphG, Ethe number of arcs inGandKis the path length.

Since we relaxed the integrality constraints, the feasible set of problem (4.1) contains the feasible set of problem (3.13), i.e., in (4.1) we search over a larger domain.

(36)

Once the relaxation is solved. Suppose we solve (4.1) and obtain the solutionx^?_tf. Two alternatives arise:

1. x^?_tf is a binary flow. In this case, x^?_tf is feasible for (3.13) and therefore also solves it. There was no loss in approximating (3.13) with (4.1). Moreover, sincex^?_tf is a binary flow on the space-time graph (STG)G_tf, we can easily extract the underlying path in G(recall that the STG contains no circulations);

2. x^?_tf is a fractional flow. The most frequent case. Here, the flowx^?_tf does not yield a valid path in the graphG. However: (i) it provides an useful upper bound on the maximum sum of prizes that can be collected by any path of lengthK and (ii) it suggests (hopefully good) paths since we can extract several paths from the fractional flowx^?_tf; we discuss this topic in the next section.

4.3 From fractional flows to paths

We solve (4.1) and obtain a fractional flowx^?_tfas the solution. How can we extract a (hopefully good) path from x^?_tf? We use a technique based on a core result in network theory: the flow decomposition theorem.

Flow decomposition theorem. The flow decomposition theorem states that any fractional flow can be seen as a mixture of binary flows and circulations. In particular, x_tf^? can be written as a linear combination of several binary flows and a residual fractional flow distribution:

x_tf^?=ζ₁z_tf⁽¹⁾+ζ₂z_tf⁽²⁾+ζ₃z⁽³⁾_tf +· · ·+ζ_Nz_tf^(N⁾+x^(N)_tfR (4.2) where 0 ≤ ζi ≤ 1 are weights,z_tf⁽ⁱ⁾ are the binary flows and x^(N_tfR⁾ is a fractional residual flow. Equa- tion (4.2) expresses the fractional flowx^?_tfas a superposition ofNbinary flows plus an residual fractional leftover.

The number of binary flows is not knowna priori but the flow decomposition theorem provides an upper bound: for a flow in a generic graph, N ≤ P +C where P is the total number of paths and C is the total number of directed cycles. Because we are optimizing over the ST G—which is acyclic by construction—we have C = 0. A bound for P is the number of nodes in the graph; for the ST G:

P ≤ |V|_tf=K|V|+ 2.

Since each z⁽ⁱ⁾_tf is a binary flow on the STG, it corresponds to a path in G. By carrying out the decomposition in (4.2) we thus find N paths embedded in x^?_tf that are (hopefully) good for the prize accumulation problem.

We now explain how to perform the flow decomposition, i.e., we describe a method that takes in a fractional flowx^?_tfand outputs the data on the right side of (4.2).

Flow decomposition algorithm. The method is iterative. It reveals the data one piece at a time leaving less and less flow in the residual as the iterations proceed. In the first iteration, we look atx^?_tf

(37)

and obtain a weightζ1, a binary flowz⁽¹⁾_tf and a residual fractional flowx⁽¹⁾_tfR:

x^?_tf=ζ1z_tf⁽¹⁾+x⁽¹⁾_tfR.

In the second iteration, we look at the residual x⁽¹⁾_tfR and obtainζ₂, z_tf⁽²⁾ and a reduced residual x⁽²⁾_tfR: x⁽²⁾_tfR =ζ2z_tf⁽²⁾+x⁽²⁾_tfR. Thus, at this point:

x^?_tf=ζ₁z⁽¹⁾_tf +ζ₂z_tf⁽²⁾+x⁽²⁾_tfR.

And so on. Each iteration looks at the previous residual and tries to decompose it further. The weights ζi decrease along iterations,ζ1 ≥ζ2 ≥ · · · ≥ζN, meaning that the dominant paths are disclosed first.

The method is given as Algorithm 4.1.

Algorithm 4.1Flow Decompositon procedureFINDPATH(G_tf,x_tf)

token←s z_tf⁽ⁱ⁾←0_|E_tf_|^| ζi←1

whiletoken6=tdo

Outset←FINDOUTGOINGARCSFROMNODE(token, G_tf)

(maxF lowArc, maxF lowM agnitude)←GETMAXFLOWARCFROMSET(Outset, x_tf) z_tf⁽ⁱ⁾_j←1, wherej←INDEXOF(maxF lowArc)

nextN ode←NODEPOINTEDBYARC(maxF lowArc) ifmaxF lowM agnitude < ζithen

ζi←maxF lowM agnitude end if

token←nextN ode end while

return{ζ_i, z_tf⁽ⁱ⁾} end procedure

procedureCOMPUTERESIDUALFLOW(x_tf,z_tf⁽ⁱ⁾,ζ_i) returnx_tfR←(x_tf−ζiz_tf⁽ⁱ⁾)

end procedure

The mechanics of Algorithm 4.1. For each iteration, we insert a token in the source nodesat the top of the STG and descend the token—from temporal layer down to temporal layer—till it reaches the sink nodetat the bottom of the STG; to lower the token from a given node, we choose the outgoing arc that carries the largest amount of residual flow.

The path traced by the token defines a binary flowz⁽ⁱ⁾_tf . We define the corresponding weightζ_i to be the minimum flow in the arcs composing the path. Since we inject one unit of flow in the STG, the weights lie in the range[0,1].

We then subtract the flowζ_iz_tf⁽ⁱ⁾from the current residual to obtain the next residual and repeat the procedure (unless the new residual is empty and we stop).

We now detail the elements present inside the algorithm. The variable z_tf⁽ⁱ⁾ is the path (binary flow computed by the algorithm at iteration i, token is the current node the algorithm is focusing on, maxF lowM agnitudeis the largest amount of flow departing from a node through a specific arc andx_tf

(38)

is the fractional flow distribution given as input.

FindOutgoingArcsFromNode finds the set of arcs in the STGG_tfthat depart from the given node.

GetMaxFlowArcFromSet locates the arc in given input set with the largest amount of flow.

IndexOf gets the corresponding index number of the input arc.

NodePointedByArc gets the node pointed by the the input arc.

4.4 Numerical results

We tested the performance of the relaxation approach in the graph with20 nodes from figure 1.1.

Figure 4.1 compares the prize accumulated by the linear relaxation against the prize collected by the brute-force search (the optimal solution). Unfortunately, as we can see, the relaxation approach achieves very poor performance. We explain why in the next paragraph.

2 4 6 8 10 12 14 16 18 20 22

1200 1400 1600 1800 2000 2200 2400 2600

[Linear Program − Flow Decomposition Search]

Absolute Total Collected Prize vs. Path Length

Path Length [node]

Absolute Merit

LP−FD Search Optimal Search

Figure 4.1: Prize accumulated by both the brute-force method and the linear relaxation versus path length.

Zigzag behavior of the fractional solution. In the STG, at each transition from temporal layer to the bottom one, we have a subset of arcs associated with a subset of entries of x^?_tf. Hopefully, the flow decomposition reveals that one of those arcs carries much more flow than the alternative ones for that layer hop. This strongly suggests that an optimal binary solution z^?_tf uses that leading arc to bridge between those layers.

Unfortunately, for the graph in figure 1.1, we do not see the flowx^?_tf concentrating in a single arc between layers. Instead, when we apply the flow decomposition method to the optimal solutionx^?_tf—for several path lengths—we find thatx^?_tfsnakes around the same subset of best nodes (those with bigger prizes) across the several layers in the STG. This phenomenon is hard to visualize for the graph in

Distributed Inference in Sensor Networks