Generation of application specific fault tolerant irregular NoC topologies using tabu search

(1)

Departamento de Informática e Matemática Aplicada Bachelor in Computer Science

Generation of Application Specific Fault

Tolerant Irregular NoC Topologies Using Tabu

Search

Gustavo Alves Bezerra

Natal-RN June 2019

(2)

Generation of Application Specific Fault twTolerant

Irregular NoC Topologies Using Tabu Search

Undergraduate thesis submitted to the De-partamento de Informática e Matemática Aplicada of the Centro de Ciências Exatas e da Terra of the Universidade Federal do Rio Grande do Norte as a partial requirement for obtaining the bachelor’s degree in Computer Science.

Advisor

PhD Monica Magalhães Pereira

Universidade Federal do Rio Grande do Norte – UFRN Departamento de Informática e Matemática Aplicada – DIMAp

Natal-RN June 2019

(3)

Bezerra, Gustavo Alves.

Generation of application specific fault tolerant irregular NoC topologies using tabu search / Gustavo Alves Bezerra. -2019.

119f.: il.

Monografia (Bacharelado em Ciência da Computação)

-Universidade Federal do Rio Grande do Norte, Centro de Ciências Exatas e da Terra, Departamento de Informática e Matemática Aplicada. Natal, 2019.

Orientadora: Monica Magalhães Pereira.

Coorientadora: Sílvia Maria Diniz Monteiro Maia.

1. Computação - Monografia. 2. Redes em chip - Monografia. 3. Topologias irregulares Monografia. 4. Aplicação específica Monografia. 5. Tolerância a falhas Monografia. 6. Busca Tabu -Monografia. I. Pereira, Monica Magalhães. II. Maia, Sílvia Maria Diniz Monteiro. III. Título.

RN/UF/CCET CDU 004

Catalogação de Publicação na Fonte. UFRN - Biblioteca Setorial Prof. Ronaldo Xavier de Arruda - CCET

(4)

Irregular NoC Topologies Using Tabu Search presented by Gustavo Alves Bezerra and accepted by the Departamento de Informática e Matemática Aplicada of the Centro de Ciências Exatas e da Terra of the Universidade Federal do Rio Grande do Norte, being approved by all members of the examining board specified below:

PhD Monica Magalhães Pereira

Advisor

Departamento de Informática e Matemática Aplicada Universidade Federal do Rio Grande do Norte

PhD Sílvia Maria Diniz Monteiro Maia

Co-advisor

PhD Márcio Eduardo Kreutz

(5)

(6)

It would be impossible to conceive this work without the support provided by the professors and UFRN’s Programa de Educação Tutorial - Ciência da Computação. Thus, special thanks to Monica Magalhães Pereira, Sílvia Maria Diniz Monteiro Maia, and Um-berto Souza Da Costa.

Thanks to my family for all the love and support, and for withstanding all the diffi-culties encountered – Erbena Sales Alves Bezerra, José Guilardo Gonçalves Bezerra, and Juliana Alves Bezerra. In addition, thanks to Iria de Fátima Bezerra Pinho for the long distance support.

Thanks to Breno “Blinn” Viana “Phong”, “Deba” Emili Costa, Felipe “Barba-lho”, Jhonattan “Johnson” Cabral, “Pratíxia” Pontes Cruz, Raul “Dalinda” Silva, “Showzivan” Medeiros da Silva Gois, and Vitor “God”eiro for all the discussions, conversations, memes and for turning the last semesters of the Computer Science course one of the most mem-orable times of my life.

Thanks to Giorgio Brito, “Juhauare” Jales, Larissa “Lucy”ano, Misa Uehara, and Paola Gessy for being present in some keys moments, helping me to keep my sanity. Thanks to Joel Felipe, Vitor “God”eiro (again), and Vitor Greati for directly and indirectly inspiring me to focus, and persist on my studies.

Last but not least, thanks to “the dudes” Victor “Polar” Santos, and Yuri “Kbelo” Messias for being present since 2010; and for all the games, CiViKs, defeats, achievements, and coffees shared.

(7)

I’ll see you the next time Remember the future is yours”

Nektar, remember the future

(8)

Específica e Tolerantes à Falhas Utilizando Busca Tabu

Autor: Gustavo Alves Bezerra Orientador(a): Doutora Monica Magalhães Pereira

Resumo

As redes em Chip (NoC) foram propostas para aprimorar o desempenho de computa-dores. As primeiras topologias sugeridas tendiam a possuir uma estrutura regular, vis-ando flexibilidade – desempenho razoável para diversas aplicações e múltiplos caminhos entre roteadores. Topologias regulares são piores em desempenho se comparadas a to-pologias geradas para aplicações específicas, normalmente irregulares. Por outro lado, topologias irregulares podem possuir baixa flexibilidade. Na era dos bilhões de transist-ores, componentes de circuitos são mais suscetíveis a falhas, sejam causadas por radiação, interferência eletromagnética ou efeitos similares. Devido ao custo de produção de tais circuitos, deseja-se aumentar a durabilidade (vida útil), desempenho e flexibilidade dos mesmos. Durabilidade pode ser obtida ao se adicionar tolerância a falhas num circuito. Portanto, ao adicionar-se componentes redundantes numa NoC (roteadores e conexões), é possível que sua durabilidade e flexibilidade (caminhos alternativos) sejam melhoradas, embora o consumo de energia piore. Este trabalho propõe a geração de topologias irregu-lares utilizando Busca Tabu.Por conseguinte, gerando topologias intermediárias: flexíveis se comparadas com a maioria das NoCs irregulares (possuindo certo grau de tolerância a falhas e caminhos alternativos entre roteadores), porém obtendo alto desempenho para aplicações específicas se comparadas com NoCs regulares.

Palavras-chave: Redes em Chip, Topologias Irregulares, Aplicação Específica, Tolerância a Falhas, Busca Tabu.

(9)

Irregular NoC Topologies Using Tabu Search

Author: Gustavo Alves Bezerra Advisor: Monica Magalhães Pereira, PhD

Abstract

Network on Chip (NoC) was proposed to enhance computer performance. Initially con-ceived topologies tended to have a regular structure, aiming flexibility – regular perform-ance for different applications, and multiple paths between routers. Regular topologies lack in performance if compared to specific application generated topologies, often irreg-ular. On the other hand, irregular topologies may lack flexibility. In the billion-transistor era, circuit components are more susceptible to faults, whether caused by radiation, elec-tromagnetic interference or similar effects. Due to the cost of producing such circuits, it is desirable to increase their durability (lifespan), performance, and flexibility. Durability may be achieved by adding fault-tolerance to the circuit. Therefore, by adding redundant components – e.g. routers or links – to an irregular NoC, it may be possible to increase its durability and flexibility (multiple communication paths), though energy consump-tion may be impaired. This work proposes the generaconsump-tion of irregular topologies using Tabu Search.Thus generating intermediate topologies: flexible if compared to most irreg-ular ones (some fault resistance), yet achieving application specific high performance if compared to regular NoCs.

Keywords: Network on Chip, Irregular Topologies, Application-Specific, Fault-Tolerance, Tabu Search.

(10)

1 Graph examples. . . p. 22 2 Example of regular NoC topologies. . . p. 25 3 Examples of irregular NoC topologies. . . p. 26 4 Examples of areas isolated in NoCs after faults. . . p. 27 5 A Task Graph. . . p. 28 6 Example of Task Graph edge conversion. . . p. 39 7 Example of unfeasible solution – U N F . . . p. 40 8 Example of Delete Edges Until Epsilon. . . p. 41 9 Example of Add Edges Until Epsilon. . . p. 42 10 Example of making an unfeasiable solution feasible. . . p. 43 11 Tabu List examples. . . p. 46 12 Examples of valid edges’ node swapping process . . . p. 52 13 Example of invalid edge node swapping operation. . . p. 52 14 Example of spin operation with a minimum degree node. . . p. 54 15 Examples of default scenario. . . p. 54 16 Example of successful spin operation with a maximum degree node. . . p. 56 17 Examples of unsuccessful spin operation with a maximum degree node. p. 57 18 Example of successful double spin operation. . . p. 58 19 Example of Fault Injection Algorithm. . . p. 60 20 Chosen TGs . . . p. 61 21 Influence of tabuListSize arguments for Latency Estimation in chosen

(11)

TGs’ generated solutions. . . p. 63 23 Overall median latency for all benchmarked TGs. . . p. 64 24 Box plots of chosen TGs solutions’ latency estimation. . . p. 66 25 Examples of AP 2T G solutions. . . p. 67 26 Examples of M P EGT G solutions. . . p. 67 27 Fault injection on median chosen TGs solutions. . . p. 69 28 SAP 2T G,15 behaviour during fault injection. . . p. 70

29 SM P EGT G,19 behaviour during fault injection. . . p. 71

30 Fault injection on median solutions with median of the chosen TGs. . p. 72 31 Influence of tabuListSize on AP 1T G solutions. . . p. 78 32 Influence of tabuListSize on AP 2T G solutions. . . p. 78 33 Influence of tabuListSize on AP 3T G solutions. . . p. 79 34 Influence of tabuListSize on AP 4T G solutions. . . p. 79 35 Influence of tabuListSize on IN T EGRALT G solutions. . . p. 79 36 Influence of tabuListSize on M P EGT G solutions. . . p. 80 37 Influence of tabuListSize on M W DT G solutions. . . p. 80 38 Influence of tabuListSize on V OP DT G solutions. . . p. 80 39 Influence of terminationCriterion on AP 1T G solutions. . . p. 81 40 Influence of terminationCriterion on AP 2T G solutions. . . p. 81 41 Influence of terminationCriterion on AP 3T G solutions. . . p. 82 42 Influence of terminationCriterion on AP 4T G solutions. . . p. 82 43 Influence of terminationCriterion on IN T EGRALT G solutions. . . . p. 82 44 Influence of terminationCriterion on M P EGT G solutions. . . p. 83 45 Influence of terminationCriterion on M W DT G solutions. . . p. 83 46 Influence of terminationCriterion on V OP DT G solutions. . . p. 83

(12)

48 Fitness (latency estimation) box plots of AP 2T G generated solutions. . p. 84 49 Fitness (latency estimation) box plots of AP 3T G generated solutions. . p. 85 50 Fitness (latency estimation) box plots of AP 4T G generated solutions. . p. 85 51 Fitness (latency estimation) box plots of IN T EGRALT G generated

solutions. . . p. 85 52 Fitness (latency estimation) box plots of M P EGT G generated solutions. p. 86 53 Fitness (latency estimation) box plots of M W DT G generated solutions. p. 86 54 Fitness (latency estimation) box plots of V OP DT G generated solutions. p. 86 55 Fault injection on median AP 1T G solutions. . . p. 87 56 Fault injection on median AP 2T G solutions. . . p. 88 57 Fault injection on median AP 3T G solutions. . . p. 88 58 Fault injection on median AP 4T G solutions. . . p. 89 59 Fault injection on median IN T EGRALT G solutions. . . p. 89 60 Fault injection on median M P EGT G solutions. . . p. 90 61 Fault injection on median M W DT G solutions. . . p. 90 62 Fault injection on median V OP DT G solutions. . . p. 91 63 Median AP 1 with median fitness. . . p. 92 64 Median AP 2 with median fitness. . . p. 93 65 Median AP 3 with median fitness. . . p. 93 66 Median AP 4 with median fitness. . . p. 94 67 Median IN T EGRAL with median fitness. . . p. 94 68 Median M P EG with median fitness. . . p. 95 69 Median M W D with median fitness. . . p. 95 70 Median V OP D with median fitness. . . p. 96 71 Median AP 1 solution with median fitness after 10% fault injection. . p. 97

(13)

73 Median AP 3 solution with median fitness after 10% fault injection. . p. 98 74 Median AP 4 solution with median fitness after 10% fault injection. . p. 99 75 Median IN T EGRAL solution with median fitness after 10% fault

in-jection. . . p. 99 76 Median M P EG solution with median fitness after 10% fault injection. p. 100 77 Median M W D solution with median fitness after 10% fault injection. p. 100 78 Median V OP D solution with median fitness after 10% fault injection. p. 101 79 Median AP 1 solution with median fitness after 20% fault injection. . p. 102 80 Median AP 2 solution with median fitness after 20% fault injection. . p. 103 81 Median AP 3 solution with median fitness after 20% fault injection. . p. 103 82 Median AP 4 solution with median fitness after 20% fault injection. . p. 104 83 Median IN T EGRAL solution with median fitness after 20% fault

in-jection. . . p. 104 84 Median M P EG solution with median fitness after 20% fault injection. p. 105 85 Median M W D solution with median fitness after 20% fault injection. p. 105 86 Median V OP D solution with median fitness after 20% fault injection. p. 106 87 Median AP 1 solution with median fitness after 30% fault injection. . p. 107 88 Median AP 2 solution with median fitness after 30% fault injection. . p. 108 89 Median AP 3 solution with median fitness after 30% fault injection. . p. 108 90 Median AP 4 solution with median fitness after 30% fault injection. . p. 109 91 Median IN T EGRAL solution with median fitness after 30% fault

in-jection. . . p. 109 92 Median M P EG solution with median fitness after 30% fault injection. p. 110 93 Median M W D solution with median fitness after 30% fault injection. p. 110 94 Median V OP D solution with median fitness after 30% fault injection. p. 111 95 Fault injection on median SAP 1T G,15 solution. . . p. 112

(14)

97 Fault injection on median SAP 3T G,16 solution. . . p. 113

98 Fault injection on median SAP 4T G,16 solution. . . p. 113

99 Fault injection on median SIN T EGRALT G,15 solution. . . p. 113

100 Fault injection on median SM P EGT G,19 solution. . . p. 114

101 Fault injection on median SM W DT G,18 solution. . . p. 114

102 Fault injection on median SV OP DT G,19 solution. . . p. 114

103 AP 1T G . . . p. 115 104 AP 2T G . . . p. 116 105 AP 3T G . . . p. 116 106 AP 4T G . . . p. 117 107 IN T EGRALT G . . . p. 117 108 M P EGT G . . . p. 118 109 M W DT G . . . p. 118 110 V OP DT G . . . p. 119

(15)

NoC – Network on Chip

QAP – Quadratic Assignment Problem TG – Task Graph

MPSoC – Multi-Processor System-on-Chip CVRP – Classical Vehicle Routing Problem SEA – Set of Edges to Add

(16)

∅ – Empty Set ∪ – Set Union ¬ – Logical negation ∀ – For all ∈ – In ∧ – Logical conjunction – A fixed number of edges ← – Attribution / ∈ – Not in ∨ – Logical disjunction ∃ – Exists ⊆ – Is contained in * – Is not contained in

SC _{– The complement of set S}

∩ – Set intersection N – Natural numbers set

(17)

1 Tabu Search Algorithm Skeleton. . . p. 30 2 Methodology. . . p. 35 3 Generate Initial Solution Graph. . . p. 38 4 Fit Solution’s number of edges to . . . p. 39 5 Deletes edge with largest degree incident nodes possible. . . p. 40 6 Adds edge between the two nodes with smallest possible. . . p. 42 7 Make a Solution Feasible. . . p. 43 8 Implemented Tabu Search. . . p. 44 9 Fitness Function implementation. . . p. 45 10 Neighbourhood Search . . . p. 48 11 Special Neighbourhood Search deletions . . . p. 49 12 Special Neighbourhood Search Additions . . . p. 50 13 Swaps the nodes incident to two distinct edges. . . p. 50 14 Spin edge. . . p. 53 15 Spin edge incident to one maximum degree node . . . p. 55 16 Double spin . . . p. 58 17 Fault Injection . . . p. 59

(18)

1 Introduction p. 20

2 Theoretical Framework p. 22

2.1 Graph Theory . . . p. 22 2.2 Quadratic Assignment Problem Function . . . p. 23 2.3 Network On Chip . . . p. 23 2.3.1 Topologies . . . p. 25 2.3.2 Fault Tolerance . . . p. 26 2.4 Taks Graphs . . . p. 27 2.5 Metaheuristics . . . p. 28 2.5.1 Tabu Searh . . . p. 29 3 Related Works p. 31 3.1 Performance . . . p. 31 3.2 Fault Tolerance . . . p. 32 4 Methodology p. 35

4.1 Definitions and Assumptions . . . p. 36 4.1.1 Solution Representation . . . p. 36 4.1.2 Feasible Solution . . . p. 37 4.2 Initial Topology Generation . . . p. 38 4.2.1 Fitting to Epsilon . . . p. 39 4.2.2 Making Feasible . . . p. 40

(19)

4.2.2.2 Adding Edges - ADD_EDGE() . . . p. 41 4.2.2.3 Make Feasible Algorithm . . . p. 42 4.3 Best Solution Search – Tabu Search . . . p. 43 4.3.1 Fitness Function . . . p. 45 4.3.2 Tabu List . . . p. 46 4.3.3 Neighbourhood Search . . . p. 47 4.3.3.1 Delete Edge Between Two Minimum Degree Nodes . . p. 50 4.3.3.2 Delete Edge Incident to One Minimum Degree Node . p. 52 4.3.3.3 Default Scenario . . . p. 54 4.3.3.4 Add Edge Incident to One Maximum Degree Node . . p. 54 4.3.3.5 Add Edge Between Two Maximum Degree Nodes . . . p. 57 4.4 Fault Injection . . . p. 58

5 Results p. 61

5.1 Performance . . . p. 63 5.2 Latency Estimation After Fault Injection . . . p. 67

6 Concluding Remarks p. 73

References p. 74

Appendix A -- Influence of tabuListSize Arguments p. 78

Appendix B -- Influence of terminationCriterion Arguments p. 81

Appendix C -- Latency Box Plots p. 84

Appendix D -- Fault Injection in Median Fitness Solutions p. 87

(20)

Appendix G -- Median Epsilon Solutions After 20% Fault Injection p. 102

Appendix H -- Median Epsilon Solutions After 30% Fault Injection p. 107

Appendix I -- Detailed Fault Injection in Some Median Fitness

Solu-tions p. 112

(21)

1 Introduction

The technological advances in computers, specially transistors, lead to an increase in the number of components that fit in a single chip. Consequently, the need to improve com-munication between chip components also increased (WANG et al., 2013). There are some ways to achieve this result. It is possible to reduce the size of circuit components, therefore shortening the physical distance between them. Hence, the density of component increases alongside the computational power in a fixed area (SCHALLER, 1997). It is also possible

to increase the number of transmitted bits per second by increasing clock frequency or the number of channels for parallel communication (STALLINGS, 2003; PATTERSON; HEN-NESSY, 2013). Alongside these techniques, changing the communication protocol may also decrease latency. Network on Chip (NoC) is such an example.

Bits were traditionally transmitted between computer components via communication bus (STALLINGS, 2003;PATTERSON; HENNESSY, 2013). This solution was satisfactory for the early Computer Age. However, the number of components per chip and the com-munication demand between components raised over the decades. Hence, the provided bus communication proved to lack flexibility and efficiency as applications’ complexity increased. In order to solve this problem, the idea of NoC was conceived.

NoCs take advantage of a well consolidated field of Computer Science: Computer Networks (HEMANI et al., 2000). This Computer Science branch has been evolving since the 1960s (KUROSE; ROSS, 2013). Its theory and systems are so sophisticated that the initial limited networks evolved to a worldwide net involving several security techniques (KUROSE; ROSS, 2013).

In order to improve chips by inserting networks features, it is necessary to add new components to it: routers. The routers are responsible for transmitting information re-ceived from a component to another (ZEFERINO; SUSIN, 2003). Additionally, these com-ponents determine the course a message will take through the network. Noticeably, the flexibility of a chip is severally increased (ZEFERINO; SUSIN, 2003). The chip’s area,

(22)

how-ever, is also increased (MORAES et al., 2004). Furthermore, due to the processing time required by a router to determine the appropriate action, the communication overhead is also affected (BEIGNÉ et al., 2005).

Initially, NoCs tended to be regularly structured. A few examples are Mesh-2d, Torus and Honeycomb NoCs (ZEFERINO; SUSIN, 2003; HEMANI et al., 2000). The regular router distribution tends to offer greater flexibility than the irregular one – regular performance for different applications, and multiple paths between routers. Therefore, efficiency for specific applications is lacked (ASCIA; CATANIA; PALESI, 2004).

On the other hand, irregular NoCs attempt to improve application specific efficiency and network performance compared to regular topologies (CHOUDHARY; GAUR; LAXMI, 2011; CHOUDHARY et al., 2010). Regular NoCs may also become irregular during the circuit’s lifespan due to faults on either routers or links (CHOUDHARY; GAUR; LAXMI, 2011).

Efforts are being applied to generate irregular topologies with lower energy consump-tion (JAIN; CHOUDHARY; SINGH, 2014). In addition, classical routing algorithms such as XY (ZEFERINO; SUSIN, 2003) tend to have unsatisfactory performance when applied to irregular NoCs (RODRIGO et al., 2011). Nevertheless, such approaches are guaranteed to be deadlock-free only for regular NoCs (RODRIGO et al., 2011). Therefore, countless routing algorithms are being conceived to improve network deadlock-free performance (MILFONT et al., 2017;GABIS; KOUDIL, 2016;LEE; PARIKH; BERTACCO, 2015). Some of these examples focus not only in deadlock-freedom, but also in fault tolerance, congestion management, and livelock-freedom.

Similarly to any circuit, NoCs are susceptible to faults. The operation of a chip may be severally committed depending on fault location (CHANG et al., 2011). Moreover, it is desired to increase the lifespan of such circuits due to their fabrication cost. Although care must be taken not to generate a regular topology, lifespan increase may be achieved by in-troducing redundant components into the NoC (WANG et al., 2013;MESQUITA, 2016;SHAH; KANNIGANTI; SOUMYA, 2017). During the fault tolerant irregular topology generation, constant monitoring is necessary to avoid energy consumption increasement. Otherwise, the main advantages of irregular topologies would be lost.

In this scenario, the proposed work focuses on generating fault tolerant irregular NoC topologies. It is desired to obtain long-lasting and efficient circuits for specific applications. Notwithstanding, the circuits should be suitable for multiple applications. The generated topologies will be evaluated regarding fault tolerance capacity, and latency.

(23)

2 Theoretical Framework

The purpose of this work is to generate task graph based irregular NoCs targeting low latency and reliability via metaheuristics. Some topics require a more solid background and are thus explored in this section: NoC, Task Graph, Metaheuristics and Fault Tolerance.

2.1 Graph Theory

A Graph is defined as a tuple of a set of vertices and a set of edges G(V, E), re-spectively (WILSON, 1979). V is often represented as a set of integers. There are multiple representations for E, but in any representation, an edge connects two nodes. Throughout this work, GV means “the set containing G’s vertices”; and GE means “the set containing

G’s edges”. A Graph may be weighted or unweighted, directed or undirected.

0 1 2 3 4 5 80 2 6.4 8 4.57 2112 73 12 3 3 (a) G0 0 1 2 3 4 5 6 (b) G1

Figure 1: Graph examples.

In a weighted graph, some value is associated to an edge (Graph G0, Figure 1a). On the other hand, unweighted edges have no value associated to them (G1, Figure 1b). A directed and unweighted graph may represent edges as tuples, because edge (0, 1) 6= (1, 0). On the other hand, undirected unweighted graphs may represent edges as sets since {0, 1} = {1, 0} (G1). Hence, self-loops would be represented as a set of one element ({0, 0} = {0}). For directed and weighted graphs (G0), edges may be represented as a

(24)

triple, i.e. (v1, v2, w); while undirected edges may be represented as a tuple of a set and a

value, i.e. ({v1, v2}, w). During this work, directed weighted graphs are used to represent

task graphs; while undirected unweighted graphs represents solutions or NoCs. Since no self-loops are allowed for solutions, undirected unweighted graphs’ edges have two elements – ∀e ∈ GE(|e| = 2).

Another concept that will be used throughout this work is the Null Graph. The Null Graph is the only graph containing 0 nodes, and consequently no edges. Specifically the, undirected unweighted Null Graph will represent an invalid solution or graph. In other words, a tuple of empty sets, i.e. (∅, ∅).

2.2 Quadratic Assignment Problem Function

The Quadratic Assignment Problem (QAP) is an NP-hard problem that raises when assigning facilities to locations. This problem is defined by Bokhari (BOKHARI, 1981) and adapted to the current work as follows. An affinity measure between two objects i, and j is given – in the current work it corresponds to the edge weight wij between nodes i,

and j –; n locations – in the proposed work, n = |T GV| where T G is a Task Graph –; the

distance distst between the locations in a Graph G – in the proposed work, the distance is

given by the number of hops in G’s shortest path from node s to t calculated by Dijkstra’s Algorithm (DIJKSTRA, 1959) –; and a function that maps objects to locations – in the proposed work, this is the identity function, i.e. i = s and j = t. Then, minimise the Function

X

ij

wijdistst. (2.1)

It is important to highlight that in the current work, distij = 1 for T G. However, for G,

distij = distst ≥ 1.

Throughout this work, the QAP Function will be used as the Tabu Search fitness function for latency estimation. Such a scenario is possible because the smaller the QAP function value, the smaller shall be the overall latency in a network.

2.3 Network On Chip

The main goal of a NoC is to improve the communication between components of a chip, specially if compared to traditional communication bus (YESIL; TOSUN; OZTURK,

(25)

2016). NoCs also provide a more scalable communication method if compared to tradi-tional ones (YESIL; TOSUN; OZTURK, 2016).

A NoC consists of two major components: routers and links (ZEFERINO; SUSIN, 2003). Routers are responsible for transmitting information (packets) between each other through the links (SOTERIOU et al., 2009). The packets may pass through multiple links and routers before reaching its destination. The router’s behaviour is described by routing algorithms, which define the path to be travelled by the packets (SOTERIOU et al., 2009). There are four core NoC features that describe the message transfers – routing algorithms, switching, flow control, and arbitration.

Routing algorithms describe the path to be coursed by a packet. According to Ze-ferino, a routing algorithm impacts on NoCs’ connectivity, deadlock and livelock freedom, adaptability, and fault-tolerance (ZEFERINO; SUSIN, 2003). The connectivity is the capa-city of sending packets from and to any core. Deadlock and livelock freedom guarantees that all packets will arrive on its destination. Adaptability is related to flexibility – the capacity of adapting to different topologies. Fault-tolerant routing algorithms attempt to guarantee connectivity even though the NoC has faulty components (ZEFERINO, 2003).

Switching describes how packets are transferred from the input to the output of a router. Some switching methods are circuit switching, store-and-forward, and wormhole. Circuit switching reserves a path until the entire message is transmitted. Store-and-forward packets have a header with information about its destination; and stored in a buffer every router until its next hop is decided. Wormhole switching divides packets into flits, and, if a flit’s output path is free, it is not stored in a buffer, being straightly transmitted to the communication channel (ZEFERINO, 2003).

Flow control describes what shall be done with packets unable to acquire some re-source. This may happen, for example, if there are numerous packets travelling through the NoC, overloading it. Depending on the flow control, a packet may be discarded, tem-porarily stored, or have its route changed (ZEFERINO, 2003).

On the other hand, arbiters are responsible for redirecting packets inside a router, i.e. input path. This scenario may occur when a router simultaneously receives multiple pack-ets competing for the same output path. Thus, the arbiter will be responsible for deciding which packets will have access to the resources first. There are centralised – one per router –, and distributed – one per path – arbiters. Some examples of arbitrating mechanism are round-robin, first-come-first-served, least recently served, et cetera (ZEFERINO, 2003).

(26)

The next two NoC’s aspects are the focus of this work and hereafter explored: topo-logies, and fault tolerance.

2.3.1 Topologies

The NoC components may be distributed in a chip regularly or irregularly. Regular topologies tend to be used for general purpose applications and have reduced design time (SRINIVASAN; CHATHA; KONJEVOD, 2006). Mesh (Figure 2a), Torus (Figure 2b), Ring

(Figure 2c), and Honeycomb (Figure 2d) are examples of regular NoC topologies ( ZE-FERINO; SUSIN, 2003; HEMANI et al., 2000; BONONI; CONCER, 2006). Routing algorithms for regular NoCs are often simple, since they are based on the regular distribution of resources. Some examples of routing algorithms for regular NoCs are XY (DEHYADGARI et al., 2005), and DyXY (LI; ZENG; JONE, 2006).

0 1 2 3 4 5 6 7 8 (a) Mesh 0 1 2 3 4 5 6 7 8 (b) Torus topology 0 1 2 3 4 5 6 7 8 (c) Ring topology 0 1 2 3 4 5 6 7 8 9 10 11 12 (d) Honeycomb topology

Figure 2: Example of regular NoC topologies.

On the other hand, irregular topologies tend to be tailored for specific-purpose applic-ations (CHOUDHARY; GAUR; LAXMI, 2011). Notwithstanding, irregular topologies may be obtained from regular NoCs for which one or more components have a permanent failure (ZHANG et al., 2009). Therefore, the study of fault-tolerance in irregular NoCs is interesting

(27)

even for the regular topology scenario. Irregular topologies can potentially improve area, energy consumption, and performance if compared to regular ones (SRINIVASAN; CHATHA; KONJEVOD, 2006). However, their routing algorithms cannot depend on the components’ regular distribution, thus, not as simple. Even so, multiple algorithms are developed and benchmarked considering high-performance improvements (MILFONT et al., 2017). Graphs IG0, IG1, IG3 are examples of irregular NoCs.

0 1 2 3 4 5 6 7 8 (a) IG0 0 1 2 3 4 5 (b) IG1 0 1 2 3 4 5 6 (c) IG2

Figure 3: Examples of irregular NoC topologies.

There are several ways to generate specific-purpose irregular NoCs. For example, Srinivasan, Chatha, and Konjevod proposed to use slicing tree and linear programming (SRINIVASAN; CHATHA; KONJEVOD, 2005). Pinto, Carloni, and Sangiovanni-Vincentelli applied a heuristic to a previously proposed Constraint-Driven Communication Synthesis (PINTO; CARLONI; SANGIOVANNI-VINCENTELLI, 2003). Ho and Pinkston’s work is based on a recursive bisection technique (HO; PINKSTON, 2003). Metaheuristics are also com-monly used for generation, a few examples are the works of (KREUTZ et al., 2005), (NEEB; WEHN, 2008), (MESQUITA, 2016), and (CHOUDHARY et al., 2010).

2.3.2 Fault Tolerance

Faults may occur in both regular or irregular topologies. Faults may occur in links, routers or even cores (AZAD et al., 2016). There are two types of faults: transient and permanent. Transient faults may be the result of noise or interference (MILFONT et al., 2017). Transient faults are hard to be corrected and do not compromise the behaviour of the circuit for a long period (MILFONT et al., 2017). On the contrary, permanent faults may happen due to physical damage or fabrication problems (MILFONT et al., 2017). NoCs are jeopardised by permanent faults, and many works focus on dealing with them (AZAD et al., 2016), increasing the circuit’s lifespan.

Faults may turn a topology unfeasible, i.e. creating two isolated (incommunicable) areas, which is clearly an undesirable scenario (CHANG et al., 2011). Some examples are

(28)

illustrated by Graphs DISCG0, DISCG1, DISCG2, and DISCG3 (Figures 4a, 4b, 4c, and 4d, respectively). In Figures 4a and 4c, the dotted nodes represent faulty routers; while in Figures 4b and 4d, the dotted lines represent faulty links. Graphs DISCG0, and DISCG1 illustrate faults turning regular NoCs unfeasible. Similarly, DISCG2, and DISCG3 represent disconnected NoCs after the failures.

0 1 2 3 4 5 6 7 8 (a) DISCG0 0 1 2 3 4 5 6 7 8 (b) DISCG1 0 1 2 3 4 5 (c) DISCG2 0 1 2 3 4 5 6 (d) DISCG3

Figure 4: Examples of areas isolated in NoCs after faults.

There are two approaches to amortise the impacts of a fault: architecture level, and system and application level approaches (AZAD et al., 2016). Architecture level approaches tackle fault-tolerance by adding redundant components, whether routers, links, or cores (AZAD et al., 2016; CHANG et al., 2011; ZHANG et al., 2009). System and application level approaches tackle the problem by adding software flexibility, e.g. routing algorithms (AZAD et al., 2016).

2.4 Taks Graphs

A Task Graph (TG) describes an application subdivided into tasks. Tasks may depend on each other. A TG is commonly modelled as a directed graph, where the vertices and edges represent tasks and dependency between them, respectively. Edges are often weighted, possible representing communication cost or duration. Figure 5 illustrates a TG generated with Task Graphs For Free (DICK; RHODES; WOLF, 1998). In this TG, task 3 depends on task 2, and the communication cost from 2 to 3 is 18.

(29)

Figure 5: A Task Graph.

In the Multi-Processor Network-on-Chip (MPSoC) context, NoCs may be used for MPSoC design, while TG tasks are mapped to MPSoC cores. There is not necessarily a bijection between TG and NoC edges. Mapping a TG to a NoC falls into the QAP category (BOKHARI, 1981;ROCHA, 2017). Irregular NoC topologies may be generated according to TGs using metaheuristics, such as Simulated Annealing (NEEB; WEHN, 2008), and Genetic Algorithm (CHOUDHARY et al., 2010;MESQUITA, 2016).

2.5 Metaheuristics

The Computer Science core is to model problems mathematically so their solution can be calculated by a computer. Problems may be classified in various categories, and according to different aspects, e.g. running time, and memory usage. Regarding running time, there exists problems known to be efficiently solvable on a computer (P class), i.e. problems that require a polynomial number of operations. On the other hand, among other characteristics, the N P class consists of decision problems for which a solution can be verified in polynomial time (CORMEN et al., 2009). Some examples of N P problems are the decision versions of the Quadratic Assignment Problem (BOKHARI, 1981), the Classical Vehicle Routing Problem (CVRP) (GENDREAU; POTVIN et al., 2010), and the Vertex Cover Problem (GAREY; JOHNSON; STOCKMEYER, 1974).

Although P ⊆ N P , it is unknown if P = N P . Thus, there are N P problems for which no polynomial solution is known. Nevertheless, it is desirable to find efficient

(30)

solu-tions even for these problems. There are techniques capable of finding the best solution (exact Algorithms). For instance, it is possible to perform exhaustive searches, branch-and-bound, et cetera. Exhaustive searches visit all the solutions looking for the optimal one. Branch-and-bound visits some solutions while pruning part of the search space. This occurs only if it can be mathematically proved that no solution of the pruned space is better than the current best one (BALAS; TOTH, 1983). However, computational time for non-small problem instances is often unfeasible.

Depending on the desired results, some non-optimal solution may be sufficient (solu-tions different from the best one). The CVRP is an example of such a problem because it may be desirable to obtain the best solution possible in a limited period of time ( GENDR-EAU; POTVIN, 2005). These solutions are denominated local optima, in contrast to the global optimum. Thus, strategies for finding local optima while still searching for the global optimum were developed. An example of such strategies are metaheuristics – con-trolled local searches capable of finding multiple local optima, and often analogous to some natural phenomena (GENDREAU; POTVIN et al., 2010). Some metaheuristic examples are Simulated Annealing, Tabu Search, Genetic Algorithms, Memetic Algorithms, et cetera. Hybridisation is also possible.

Metaheuristics are used for generating irregular topologies because it is an NP-hard problem – a variation of the Steiner Tree Problem (RAVI et al., 2001;MESQUITA, 2016).

2.5.1 Tabu Searh

The main idea of the Tabu Search is to find local optima, while escaping from recently found solutions (GENDREAU; POTVIN, 2005). One solution can be obtained from another by performing a neighbourhood step operation on the Search Space using the Neighbour-hood Structure (GENDREAU; POTVIN, 2005). A neighbourhood step operation depends on how the problem is modelled, and on how a movement is defined (GENDREAU; POTVIN, 2005). A movement may be described as swapping, removing, or adding Neighbourhood Structure elements, et cetera.

One of the key factors of Tabu Search is to prevent the search from visiting a solution multiple times. This is achieved with a short term memory – called Tabu List – that stores recently performed neighbourhood movements (GENDREAU; POTVIN, 2005). Essentially, if a Neighbour Solution contains a Tabu Movement, it is not considered in the Search Space. Tabu Lists are often implemented as circular queues and their size depends on the problem and the performed experiments (GENDREAU; POTVIN, 2005). In this case,

(31)

the union operation may remove the oldest inserted element. A Tabu List may be too restrictive or too permissive, depending on how the problem is modelled or even in the desired results.

The Tabu Search skeleton is described in Algorithm 1, this is the same Algorithm as the one described by Gendreau and Potvin (GENDREAU; POTVIN, 2005), but with a different notation . It is important to emphasise some points also highlighted by the authors. The termination criterion depends on the problem, though it is usually defined in number of iterations. The f itness() function is used to rank and evaluate different solutions. The selectBestN eighbour() function returns the best neighbour of the current solution considering movements not in the Tabu List. Gendreau and Potvin also states that an Aspiration Criterion may be necessary while searching the neighbourhood (GENDREAU; POTVIN, 2005). For instance, if the fitness of a solution containing a Tabu Movement is better than the best solution found, the movement should be performed even though it is Tabu.

Algorithm 1 Tabu Search Algorithm Skeleton.

1: _{function Tabu Search skeleton}

2: S ← S0 . creates initial solution and sets current solution

3: BS ← S0 . sets the initial solution as the best found

4: T L ← ∅ . initialises Tabu List

5: while ¬ termination criterion do

6: S ← selectBestNeighbour(S, T L)

7: if fitness(S) < fitness(BS) then

8: BS ← S . saves current solution as the best

9: end if

10: T L ← T L ∪ {perf ormedM ovement}

11: end while

(32)

3 Related Works

There are several works in the literature regarding irregular NoC topologies. Two major areas are of interest: performance and fault tolerance. The performance area focus on improvement by generating topologies; routing algorithms; physical simulations; etc. On the other hand, fault-tolerance embraces topics such as maintenance of a regular NoC using spare routers and virtual topologies; topology reconfiguration; built-in router self-diagnosis; detection and handling of transient and permanent faults; routing algorithms, fault tolerance on routers and links; et cetera (SALMINEN; KULMALA; HAMALAINEN, 2008;

RADETZKI et al., 2013).

3.1 Performance

Two possible ways to enhance performance are to improve the routing algorithms, and to generate application-specific topologies. Two works about routing algorithms and four about topology generation are henceforth mentioned: routing table minimisation (MOTA et al., 2016), fault-tolerant enhanced odd-even XY routing algorithm (ABEDNEZHAD; ALAVI, 2017), design of irregular topologies for heterogeneous NoCs (NEEB; WEHN, 2008), lin-ear programming (SRINIVASAN; CHATHA; KONJEVOD, 2006), ant lion optimisation ( VEN-KATARAMAN; KUMAR, 2019), and the genetic algorithm (MESQUITA, 2016).

There are multiple works about routing algorithms for irregular NoCs. Most of the works focus on ensuring deadlock free algorithms, and the method used to guarantee it directly affects performance. In addition, routing algorithms are often applied to irregular NoCs to increase its lifespan. For instance, (MOTA et al., 2016) focuses on reducing the size of the routing table to improve performance. As another example, (ABEDNEZHAD; ALAVI, 2017) uses a hybrid approach to obtain a fault tolerant deadlock free routing algorithm. Their solution acts as a XY routing algorithm by default and uses enhanced odd-even model when a faulty link is found.

(33)

chain topology; then, edges are added to it by a greedy algorithm (NEEB; WEHN, 2008). The obtained graphs are compared to mesh, torus, and spidergon topologies.

(SRINIVASAN; CHATHA; KONJEVOD, 2006) generate application-specific NoCs by using Linear-Programming . Their objective is to minimise power consumption while maxim-ising performance. Thus, the physical size and distance between components is considered throughout the process.

Venkataraman and Kumar also proposed to decrease power consumption for applic-ation specific topologies. Their work uses an ant lion optimisapplic-ation technique to generate the topologies (VENKATARAMAN; KUMAR, 2019). In addition, redesigning the router ar-chitecture helped to improve the obtained results (VENKATARAMAN; KUMAR, 2019).

The Genetic Algorithm proposed in (MESQUITA, 2016) generates irregular topologies from a 2D-Mesh population. The population is submitted to mutations, where links circuit components may be removed. The implemented algorithm uses single-point crossover. Due to the nature of the problem, however, single-point crossover may not contribute to multiple neighbourhoods exploration. In addition, improved results may be achieved if the initial population also contains different individuals, such as Torus, and Honeycomb. However, for a few works, it is necessary to review some implementations details in order to explore a wider range of solutions. In addition, the presented works focus solely on performance, while fault-tolerance is mentioned as a desired feature for future projects. On the contrary, the proposed work focuses on generating topologies that are simultaneously efficient and fault-tolerant.

3.2 Fault Tolerance

Two usual ways to add fault tolerance to NoCs are through routing algorithms, or com-ponent redundancy. For comcom-ponent redundancy, one may simply duplicate the resources (links, routers or PEs); create alternative paths inside routers; etc.

The proposed work focuses on topology link redundancy, ensuring alternative paths between two routers. Five complementary works are thus highlighted: lightweight fault-tolerant mechanism, (KOIBUCHI et al., 2008), fault-isolation circuits (LIN et al., 2009), a fault-tolerant honeycomb model (YANG et al., 2016), De Bruijn’s algorithm ( HOSSEIN-ABADY et al., 2007), Bio-inspired algorithms (BECKER; KRÖMKER; SZCZERBICKA, 2015), and the Poorest Neighbour approach (SHAH; KANNIGANTI; SOUMYA, 2017).

(34)

The lightweight fault-tolerant Mechanism achieves fault tolerance by adding redund-ant components (KOIBUCHI et al., 2008). However, the work’s premise is to duplicate simple components since they are less susceptible to failure (KOIBUCHI et al., 2008). In summary, it prevents failures on routers by adding alternative paths bypassing the cross-bar (KOIBUCHI et al., 2008). The work of Lin, et al. also uses a similar strategy (LIN et al., 2009).

In the fault-tolerant honeycomb model (YANG et al., 2016), tolerance is achieved by adding one extra input/output link per processing element. This is achieved by adding a spare router in the centre of each hexagon. The spare router is therefore connected to the six processing elements. Hence, this approach handles faults in links and routers. A message will move around a faulty link by passing through the spare router, though there is an overhead increase. If a router fails, the corresponding processing element would normally become inaccessible. The honeycomb model solves this problem by connecting two routers to a processing element. Thus, a new hexagon is simulated by nearby spare routers. In the proposed work, it was decided not to use this technique since a honeycomb model tends to occupy larger areas.

(SHAH; KANNIGANTI; SOUMYA, 2017) states that De Bruijn’s graph is widely ap-plied in Bioinformatics. De Bruijn’s algorithm uses mathematical formulae to determine if two nodes should be connected (HOSSEINABADY et al., 2007). The binary version of

the algorithm focuses on associating every node possible to four edges, with only a few exceptions (HOSSEINABADY et al., 2007). De Bruijn’s algorithm achieves 100% fault toler-ance for links (SHAH; KANNIGANTI; SOUMYA, 2017). Although, this approach is unfeasible because nonplanar graphs may be generated.

The work developed by (BECKER; KRÖMKER; SZCZERBICKA, 2015) seems promising since it evaluates heuristics and Bio-inspired algorithms to generated fault-tolerant graphs. However, the work is very superficial. It even lacks core information such as algorithm details.

The Poorest Neighbour Algorithm (SHAH; KANNIGANTI; SOUMYA, 2017) is a determ-inistic algorithm that adds link fault tolerance to a NoC given its application graph. Compared to De Brujin’s algorithm, the generated topology considerably reduces the number of necessary links. Additionally, the authors claim that the Poorest Neighbour achieves 100% link fault-tolerance.

While simulating the algorithm provided by the author, problems were found. The algorithm was tested only for few more than forty graphs. From these graphs, just a small

(35)

subset represent applications for which fault-tolerance could not be added manually and effortless. In addition, the algorithm has three different implementations not mentioned in the paper. And given the same graph, the algorithms may have different outputs. However, the most compromising problem is that building a NoC directly from an outputted graph may be unfeasible. This is due to the algorithm’s nature. Routers are simulated with unlimited ports and links are never removed from the topology, only inserted.

Some of the presented works tackle the problem of adding fault-tolerance, focusing either on regular or on irregular topologies. The works that generate fault-tolerant topolo-gies need to be enhanced to consider necessary limitations (such as port limit per router). Therefore, inapplicable (nonplanar) graphs are more likely to be obtained. On the other hand, the proposed work focuses on generating irregular topologies for which the routers have a maximum of four ports. In addition, a limit for the number of links is required. Together, these restriction increase the odds of obtaining planar solutions.

(36)

4 Methodology

Irregular topologies overperform regular topologies for specific applications, since their structure tends to be more similar to some TG. Adding fault tolerance is desirable, and in many cases, essential to increase the lifespan of a NoC, whether its topology is regular or irregular. Thus, the proposed work focuses on generating high-performance irregular NoC topologies with link redundancy to increase fault tolerance.

The proposal is to generate irregular NoC topologies with redundant links for fault tolerance. The topologies generated are evaluated in order to estimate how latency can be affected by the approach.

To generate the topologies, a software was implemented in C++. The choice of a high level abstraction design tool was made based on the fact that this work attempts to explore different topologies through heuristic algorithms.

Therefore, latency will be estimated by the QAP function, i.e. number of hops weighted by the TG weight, as detailed in section 2.2. For a given TG, the number of routers is fixed; and the number of links is used to classify different topologies. It is also desirable to evaluate if there exists a (number of links) limit for significant performance improvements. Nevertheless, topologies for efficient circuits with long lifespan are expected to be outputted. The proposed algorithm has three main stages: initial topology generation, best solution search and fault injection. These stages are listed in Algorithm 2.

Algorithm 2 Methodology.

1: _{function main(graph: T G; int: , tabuListSize, terminationCriterion)}

2: S ← GENERATE_INITIAL_SOLUTION(T G, )

3: S ← TABU_SEARCH(T G, S, , tabuListSize, terminationCriterion)

4: FAULT_INJECTION(S, T G)

5: end function

(37)

value. The second step uses Tabu Search and the QAP function to generate a local op-timal solution. The third step stresses the generated topology multiple times by randomly choosing links to fail.

4.1 Definitions and Assumptions

There are a set of definitions and assumptions that will be used throughout the re-maining sections. The topics to be discussed are solution representation, and feasible solution.

4.1.1 Solution Representation

Any NoC (consequently any solution) can be represented as a graph. Thus, there is a bijection between nodes (vertices) of a graph and routers of a topology, and edges of a graph and links of a topology. These terms are hereafter used interchangeably. For the current problem, a TG is read as an adjacency, while a solution is represented as a triangular adjacency matrix with no main diagonal.

Due to problem restrictions, the links on solutions are bidirectional. Thus, solutions can be represented as symmetric adjacency matrices. Since self-loops are not allowed, it is not necessary to store the matrix’s main diagonal. Hence, in order to save memory, only the elements below the main diagonal are stored. For example, the graph represented by the matrix on the left would be stored as the matrix on the right,

            0 10 0 7 5 0 10 0 1 0 2 0 0 1 0 4 0 0 7 0 4 0 2 2 5 2 0 2 0 0 0 0 0 2 0 0             →             10 0 1 7 0 4 5 2 0 2 0 0 0 2 0             . (4.1)

Thus, by Arithmetic Progression, instead of storing |V |2 edges, only n(a1+ an) 2 = |V |(0 + |V | − 1) 2 (4.2) = |V | 2_{− |V |} 2 (4.3)

(38)

Throughout the Tabu Search, the only information used for solution edges is whether they exist or not. Thus, the solution is stored as a boolean matrix with no main diagonal. For instance, the previous graph would be represented as the following matrix:

            10 0 1 7 0 4 5 2 0 2 0 0 0 2 0             →             1 0 1 1 0 1 1 1 0 1 0 0 0 1 0             . (4.4)

This matrix representation was implemented to represent graphs. However, the defin-itions of Section 2.1 will be used in the pseudocodes. Henceforth, a solution will be rep-resented as an undirected and unweighted graph.

4.1.2 Feasible Solution

Henceforth, a solution is considered feasible if it meets the following restriction. As-sume that the corresponding solution graph is S(V, E); and that degree(v) is a function returns the degree of vertex v, i.e. the number of edges incident to it. Thus, the restriction is described by,

∀v (v ∈ SV ∧ 2 ≤ degree(v) ≤ 4). (4.5)

This restriction guarantees that the routers have a standard design, with a maximum of four output ports. This router architecture is commonly found in regular NoCs, such as Mesh-2D, and Torus (JANTSCH; TENHUNEN et al., 2003). Thus, the obtained solutions are more likely to have lower power consumption, and simpler design if compared to the solutions generated by the Poorest Neighbour (SHAH; KANNIGANTI; SOUMYA, 2017), and De Bruijn’s algorithm (HOSSEINABADY et al., 2007). In addition, this restriction aims to increase reliability through redundancy since there are at least two edges through which it is possible to reach a node, i.e. at least one alternative path. Therefore, link redundancy is achieved; potentially improving reliability.

It is important to highlight that Equation 4.5 is not sufficient to guarantee graph planarity. Therefore, non-planar graphs may be generated by the Algorithm and classified as feasible solutions. In other words, some solutions may not be used in 2D-MPSoC design.

(39)

4.2 Initial Topology Generation

In order to generate an initial feasible solution for a given combination of TG and , there are two not necessarily distinct possible scenarios. First, the number of edges in the original TG may be different from the value; thus it is necessary to remove or add edges until both values match. Second, the TG may not be feasible; thus, it is necessary to move some existent edges until the condition of Equation 4.5 is met.

For some values of , no feasible solution is possible. Thus, it is necessary to assert its value before initiating the process of generating an initial topology. This condition can be easily verified using the Handshaking Lemma (WILSON, 1979),

X

v∈V

degree(v) = 2|E|. (4.6)

Therefore, it is possible to assert that the condition described by Equation 4.5 is satisfied by guaranteeing that, for the desired solution,

2 ≤ 2 |V | =

2|E|

|V | ≤ 4. (4.7)

In summary, the algorithm for generating the initial topology (Algorithm 3) is divided in four steps: asserting that is valid, converting the directed edges T GE to undirected,

fitting |E| to , and making the solution feasible. The last two steps are discussed in Sections 4.2.1, and 4.2.2, respectively.

Algorithm 3 Generate Initial Solution Graph.

1: _{function GENERATE_INITIAL_SOLUTION(graph: T G, int: )} 2: if 2 ≤ 2/|T GV| ≤ 4 then 3: SE ← {{v1, v2}|∃e∃w(e ∈ T GE ∧ (e = (v1, v2, w) ∨ e = (v2, v1, w))} 4: S ← (T GV, SE) 5: S ← FIT_TO_EPSILON(S, ) 6: S ← MAKE_FEASIBLE(S) 7: return S 8: end if 9: return (∅, ∅) 10: end function

The solution is represented as a symmetric unweighted graph. Therefore, the conver-sion process (Algorithm 3, line 3) “removes” the edge direction and weight, adding it to the

(40)

graph. This process can be visualised in Figure 6. Suppose that Graph T G = GISEC0 (Figure 6a). Then, both edges (5, 2, 6) – i.e. edge from node 5 to 2 with weight 6 –, and (2, 5, 2) would be converted to {2, 5}; thus, two edges were collapsed to one. In addition, edge (1, 0, 4) would be converted to {0, 1}. After all conversions are performed, the res-ulting Graph S is represented in Figure 6b (Graph GISEC1). Note that the obtained Graph is undirected and unweighted.

0 1 2 3 4 5 6 2 4 8 5 6 8 7 8 (a) GISEC0 0 1 2 3 4 5 (b) GISEC1

Figure 6: Example of Task Graph edge conversion.

4.2.1 Fitting to Epsilon

This stage guarantees that the initial solution will have a number of edges correspond-ent to the epsilon restriction parameter ( = |E|). For example, if < |E|, then edges need to be removed from the graph. On the other hand, if > |E|, then edges will be added to the graph. In order to explore a wider range of solutions in multiple executions, these edges are randomly deleted and inserted, as detailed in Algorithm 4.

Algorithm 4 Fit Solution’s number of edges to .

1: _{function FIT_TO_EPSILON(graph: S, int: )} 2: while < |SE| do 3: SE ← SE − {random(SE)} 4: end while 5: while > |SE| do 6: SE ← SE ∪ {random(SEC)} 7: end while 8: return S 9: end function

(41)

4.2.2 Making Feasible

This stage guarantees that the initial solution is feasible. For instance, Graph U N F does not represent a feasible solution because nodes 4 and 7 have degree 5, and nodes 0 and 2 have degree 1 (Figure 7). In addition, the condition stated in Equation 4.7 would be true for Graph U N F if = 13. Therefore, it could have been outputted from the F IT _T O_EP SILON function (Section 4.2.1). In order to properly under-stand the MAKE_FEASIBLE() function, it is necessary to comprehend the behaviour of DEL_EDGE() and ADD_EDGE() functions.

0 1 2

3 4 5

6 7 8

Figure 7: Example of unfeasible solution – U N F .

4.2.2.1 Deleting Edges - DEL_EDGE()

To select an edge to be deleted, this function simply selects the node with largest degree (ldn). Then, selects its neighbour with largest degree (ldneigh). Afterwards, it removes the edge between these two nodes. It is possible that multiple vertices have the largest degree, any of them can be chosen randomly. The algorithm’s pseudo-code is described in Algorithm 5.

Algorithm 5 Deletes edge with largest degree incident nodes possible.

1: _{function DEL_EDGE(graph: S)}

2: ldn ← random(argmax(degrees(SV))) . largest degree node

3: edges ← {e|e ∈ SE ∧ ldn ∈ e} . edges incident to ldn

4: neighs ← {v|v ∈ e ∧ v 6= ldn ∧ e ∈ edges} . nodes adjacent tp ldn

5: ldneigh ← random(argmax(degrees(neighs))) . largest degree neighbour

6: edge ← {ldn, ldneigh} 7: SE ← SE − {edge}

8: return S

(42)

To illustrate this procedure, suppose that S = DU E0 (Figure 8a). In Graph DU E0, degree(v) = 4 for either v ∈ {0, 1, 3, 5}. Hence, any of these nodes can be selected. Let ldn = 0; since degree(1) = degree(5) = 4, it is possible to delete two edges – {0, 1}, or {0, 5}. This scenario is illustrated by Graph DU E1 (Figure 8b), where removable edges are dotted. If {0, 1} is chosen, Graph DU E2 (Figure 8c) is obtained. If the function was called for Graph DU E2 (S = DU E2), edge {3, 5} would be deleted. The resulting Graph is illustrated in Figure 8d, where the removed edge is dotted.

0 1 2 3 4 5 (a) DU E0 0 1 2 3 4 5 (b) DU E1 0 1 2 3 4 5 (c) DU E2 0 1 2 3 4 5 (d) DU E3

Figure 8: Example of Delete Edges Until Epsilon.

4.2.2.2 Adding Edges - ADD_EDGE()

This function simply selects the nodes with smallest degrees (sdn, and sdn2), and

adds an edge between them. The Algorithm expects a set of prohibited edges, a “Tabu List” named T L. If the Algorithm attempts to add edge {sdn, sdn2}, and it already exists

({sdn, sdn2} ∈ SE), or it is tabu ({sdn, sdn2} ∈ T L); it is necessary to change vertex

sdn2 to a previously unvisited smallest degree node. The process is repeated until an edge

can be inserted into the Graph. This Algorithm’s behaviour is detailed in Algorithm 6. The random function is necessary because there may exist multiple smallest degree nodes in a Graph.

For example, suppose that S = AU E0 (Figure 9a), and T L = ∅. The possible values for sdn are 1, 2, 3, 4, or 5. Suppose that sdn = 3. Then, the possible values for sdn2are 1, 2,

4, or 5. If sdn2 = 4, the edge {sdn, sdn2} cannot be added to the Graph since {3, 4} ∈ SE.

This scenario is illustrated by Graph AU E1 (Figure 9b), where the dashed edges are addable, and the dotted edge cannot be inserted. If sdn2 = 5, then the Graph S obtained

(43)

is represented by AU E2 in Figure(9c). If the function is called for AU E2 and T L = { {1, 2} }, the possible values for sdn, and sdn2 are sdn = random({1, 2, 4}), and sdn2 =

random({1, 2, 4}−{sdn}). In other words, one random edge in the { {1, 2}, {1, 4}, {2, 4}} set will be inserted. However, {1, 2} ∈ T L, thus it cannot be inserted into the Graph. Graph AU E3 (Figure 9d) illustrates this scenario, where dashed links can be added, while the dotted link cannot.

Algorithm 6 Adds edge between the two nodes with smallest possible.

1: function ADD_EDGE(graph: S, tabulist: T L)

2: sdn ← random(argmin(degrees(SV))) . smallest degree node

3: V ← {sdn} . set of visited vertices

4: repeat

5: sdn2 ← random(argmin(degrees(SV − V )))

6: edge ← {sdn, sdn2}

7: V ← V ∪ {sdn2}

8: until edge /∈ SE∧ edge /∈ T L

9: SE ← SE ∪ {edge} 10: return S 11: end function 0 1 2 3 4 5 (a) AU E0 0 1 2 3 4 5 (b) AU E1 0 1 2 3 4 5 (c) AU E2 0 1 2 3 4 5 (d) AU E3

Figure 9: Example of Add Edges Until Epsilon.

4.2.2.3 Make Feasible Algorithm

The algorithm presented in this section swaps edges’ positions until the obtained solution is feasible (Equation 4.5). The algorithm consists of deleting edges from the

(44)

nodes with the largest degrees and adding them between the nodes with smallest degrees until the solution is feasible. The last deleted edge cannot be added in the same iteration or an infinite loop may occur. There is a chance that a disconnected graph is generated during this process. The MAKE_FEASIBLE() function is detailed in Algorithm 7. Algorithm 7 Make a Solution Feasible.

1: function MAKE_FEASIBLE(graph: S)

2: while min(degrees(SV)) < 2 ∨ max(degrees(SV)) > 4 do

3: S2 ← DEL_EDGE(S)

4: T L ← SE − S2E . identifies deleted edge and creates Tabu List

5: S ← ADD_EDGE(S2, T L)

6: end while

7: return S

8: end function

Suppose, for example, that S = M F G0 (Figure 10a). Then, during the first iteration, either edge {0, 3}, or {0, 5} are deleted, and stored into S2. If {0, 5} is randomly chosen,

then S2 = M F G1 (Figure 10b). resulting in Graph M F G1 – the deleted edge is dotted.

Afterwards, the Algorithm randomly adds one of the edges {1, 4}, {2, 4}, or {5, 4}. If edge {2, 4} is chosen, then the Graph M F G2 (Figure 10c) is obtained. The removed, and inserted edges are respectively drawn as dotted, and dashed lines in the Figure. After this step, the Algorithm would stop since M F G2 is a feasible solution.

0 1 2 3 4 5 (a) M F G0 0 1 2 3 4 5 (b) M F G1 0 1 2 3 4 5 (c) M F G2

Figure 10: Example of making an unfeasiable solution feasible.

4.3 Best Solution Search – Tabu Search

Given an initial feasible solution, a local search (Tabu Search) is performed to optimise the QAP function (Section 2.2). The Tabu Search skeleton was discussed in Section 2.5.1. The Tabu Search per se is not complicated. Although, as stated by Gendreau, and Potvin, it is necessary to properly analyse the problem in hand in order to efficiently represent

(45)

a solution and the neighbourhood step. The Tabu List contains recently deleted edges, preventing them to be added again to the solution in the next few iterations. The major efforts of the proposed Tabu Search focus on the Neighbourhood Search steps to explore as much feasible solutions as possible. Section 4.3.2 details the Tabu List implementation, while Section 4.3.3 describes the Neighbourhood Search Algorithms. The behaviour of the proposed Tabu Search is described by Algorithm 8.

Algorithm 8 Implemented Tabu Search.

1: _{function TABU_SEARCH(graph: T G, S}₀; int: , tabuListSize, termCrit)

2: S ← S0

3: BS ← S0 . sets the initial solution as the best found

4: T L ← ∅ . empty circular queue

5: count ← 0

6: while count < termCrit do

7: N S ← NEIGHBOURHOOD_SEARCH(S, ∅, n) . set of n random neighbours

8: BN ← SELECT_BEST_NEIGHBOUR(N S, T G)

9: if FITNESS(BN, T G) ≥ FITNESS(BS, T G) then

10: REMOVE_TABU_SOLUTIONS(N S, T L)

11: if N S = ∅ then

12: N S ← NEIGHBOURHOOD_SEARCH(S, T L, n)

13: if N S = ∅ then

14: return BS . non-tabu neighbour obtainable

15: end if 16: end if 17: BN ← SELECT_BEST_NEIGHBOUR(N S, T G) 18: end if 19: count ← count + 1 20: S ← BN

21: if FITNESS(S, T G) < FITNESS(BS, T G) then

22: BS ← S . saves current solution as the best

23: count ← 0 24: end if 25: if |T L| = tabuListSize then 26: REMOVE_OLDEST_ELEMENT(TL) 27: end if 28: T L ← T L ∪ {S perf ormedDels}

(46)

29: end while

30: return BS

31: end function

The termination criterion (termCrit) is defined in number of iterations with no best solution improvement. Each iteration begins with a Neighbourhood Search (N S), return-ing n solutions, independent of bereturn-ing tabu or not. In the implemented code, n = . Afterwards, it is verified if the best neighbour is better than the best solution – the FIT-NESS() function is defined as the QAP stated in section 2.2. Since N S may contain tabu neighbours, this condition simulates an Aspiration Criterion. If the condition is not met, tabu neighbours are removed from N S, and the best solution from the remaining ones is chosen as the current solution. However, this set may be empty, i.e. only tabu neigh-bours were generated. If this is the case, then a maximum of n non-tabu neighneigh-bours are generated. The best solution of this new set is then chosen as the current solution. If the new set is empty, then a non-tabu neighbour of S does not exist, and the Tabu Search returns the best solution found even though the termination criterion is not yet met. Such a scenario may happen if the Tabu List size is too large, i.e. too restrictive. The deletions performed during the Neigbourhood Search to obtain BN (Best Neighbour) are added to the Tabu List.

4.3.1 Fitness Function

The fitness function is defined as the QAP function. The better the solution, the smaller will be its fitness value. The function behaviour is described by Algorithm 9. Dijkstra’s Algorithm is being used to calculate the shortest path between two nodes. Then, the number of edges (hops) is multiplied by the respective communication cost of the T G. This process is repeated for all edges in the TG.

Algorithm 9 Fitness Function implementation.

1: function FITNESS(graph: S, T G)

2: sum ← 0

3: for all (v1, v2, w) ∈ T GE do

4: sum ← sum + w · |SHORTEST_PATH(T G, v1, v2)|

5: end for

6: return sum

(47)

4.3.2 Tabu List

The Tabu List stores all the edges deleted in a single movement. Multiple edges may be removed in a single movement, thus, a Tabu List stores sets of edges (a set of sets of edges). For the current proposal, implementing the Tabu List simply as a set of edges would be too restrictive, while storing all edges deleted and added in a single movement would be a loose restriction.

Unless the Aspiration Criterion is being evaluated, the elements of the Tabu List will neither be present in the current solution, nor in the generated neighbours. Formally, a graph S is considered Tabu if, for a Tabu List T L,

∃te (te ∈ T L ∧ te ⊆ SE). (4.8)

For instance, if a Tabu List element contains two edges, then they cannot be simultan-eously in the graph.

As a concrete example, suppose that Graph T LG0 corresponds to some iteration’s current solution (Figure 11a). Also, assume that

T L = { {{5, 2}}, {{5, 0}, {4, 1}}, {{3, 0}, {5, 3}} } (4.9) is the Tabu List for the same iteration. Then, using the movements described in Section 4.3.3, it is possible to generate neighbours Graphs T LG1, T LG2, and T LG3 (Figures 11b, 11c, and 11d, respectively). 0 1 2 3 4 5 (a) T LG0 0 1 2 3 4 5 (b) T LG1 0 1 2 3 4 5 (c) T LG2 0 1 2 3 4 5 (d) T LG3

Figure 11: Tabu List examples.

(48)

element te, though te 6⊆ SE. T LG1, and T LG2 will not be visited since they correspond to

Tabu Solutions. T LG1 is Tabu because it contains edge {5, 2}, i.e. {{5, 2}} ⊆ GRAP H1E.

T LG2 is Tabu because it contains both edges {5, 0}, and {4, 1}, i.e. {{5, 0}, {4, 1}} ⊆ GRAP H2E. On the contrary, T LG3 is not Tabu because although it contains edge (3, 0),

it does not contain edge {5, 3}, i.e. {{3, 0}, {5, 3}} * GRAP H3E.

4.3.3 Neighbourhood Search

Although Gendreau, and Potvin states that visiting unfeasible solutions may con-tribute to find the global optimum, this scenario is not desirable in the proposed work due to the large number of unfeasible solutions. For example, the simplest scenario is to generate ring topologies 1 _{– topologies for which all nodes have degree 2. Considering}

a TG with 10 vertices, a ring topology would have 10 edges as well. Therefore, there are |V |2₂−|V | = 45 possible edges (Section 4.1.1). Then, by simple Combinatorics, there are

45!

10!(45−10)!) ≈ 3.19·10

9 _{possible graphs with 10 edges. However, there are only} 10!

10 ≈ 3.6·10 5

possible ring topologies, i.e. approximately 0.0113% of the total combinations to be ex-plored. Consequently, allowing Tabu Search to explore unfeasible solutions in this scenario is impracticable.

Therefore, the neighbourhood step focus on visiting solely feasible solutions. This process is described by Algorithm 10. It basically randomly selects an existent edge to be deleted (edel ∈ SE), and randomly selects a non-existent edge to be added (eadd ∈ SEC).

Once the values of edel or eadd are chosen, the Algorithm will attempt to generate a

non-tabu solution using these edges, adding it to the set of obtained neighbours N S. If this is not possible, the edges are added to a set of non-selectable edges – Tdel for edel, and Tadd

for eadd. Then, yet not explored edges are selected (edel ∈ T/ del∨ eadd∈ T/ add). The process

is repeated until n neighbours are generated (|N S| = n) or all combinations of edel and

eadd are explored (Tdel = SE). It is important to highlight that in the implemented code,

n = .

The condition in lines 8 and 19 guarantees that the obtained neighbour is valid. It must not be the Null Graph, N 6= (∅, ∅), neither be an already generated neighbour, N /_{∈ N S, nor be a Tabu Neighbour after the special operations were performed, @te(te ∈} T L ∧ te ⊆ NE).

1_{Even though ring topologies are regular, they are the minimum possible fault tolerant topology}

conceivable. Therefore, they are still eligible study objects. Notwithstanding, for a given TG, it may be interesting to analyse what would be the solution (smallest latency) for the worst case scenario (minimum fault tolerance).

(49)

Algorithm 10 Neighbourhood Search

1: _{function NEIGHBOURHOOD_SEARCH(graph: S, tabulist: T L, int: n)}

2: Tdel ← ∅ . tabu edges to del

3: N S ← ∅ . set of generated neighbours

4: while |Tdel| < |SE| ∧ |N S| < n do

5: edel ← random(SE − Tdel)

6: N ← SPECIAL_DELS(S, edel, T L)

7: if NE 6= SE− {edel} then . some special deletion was performed

8: if N 6= (∅, ∅) ∧ N /_{∈ N S ∧ @te (te ∈ T L ∧ te ⊆ N}E) then

9: N S ← N S ∪ {N } . non-tabu neighbour generated

10: else

11: Tdel ← Tdel∪ {edel} . edel cannot be deleted

12: end if

13: continue

14: end if

15: Tadd ← {edel} . tabu edges to add

16: while Tadd < NEC do

17: eadd← random(NEC − Tadd)

18: N ← SPECIAL_ADDS(N, eadd, T L ∪ {edel})

19: if N 6= (∅, ∅) ∧ N /_{∈ N S ∧ @te (te ∈ T L ∧ te ⊆ N}E) then

20: N S ← N S ∪ {N }

21: break

22: end if

23: Tadd← Tadd∪ {eadd}

24: end while

25: if Tadd= NEC then . if no possible edge can be added

26: Tdel ← Tdel∪ {edel} . then edel cannot be deleted

27: end if

28: end while

29: return N S

30: end function

Throughout Algorithm 10, it is possible to generate unfeasible solutions by either deleting or adding the edges. Thus, five possible scenarios may raise depending on the chosen edges: