Experiments in evolutionary collective robotics

(1)

U

NIVERSIDADE DE

L

ISBOA

Faculdade de Ciˆencias

Departamento de Inform´atica

EXPERIMENTS IN EVOLUTIONARY COLLECTIVE

ROBOTICS

Andr´e Gonz´alez Amor de Bastos

MESTRADO EM ENGENHARIA INFORM ´

ATICA

Especialização em Sistemas de Informação

2011

(2)

(3)

U

NIVERSIDADE DE

L

ISBOA

Faculdade de Ciˆencias

Departamento de Inform´atica

EXPERIMENTS IN EVOLUTIONARY COLLECTIVE

ROBOTICS

Andr´e Gonz´alez Amor de Bastos

DISSERTAC

¸ ˜

AO

Projecto orientado pelo Prof. Doutor Paulo Jorge Vaz Cunha Dias Urbano e co-orientado pelo Prof. Doutor Anders Lyhne Christensen

MESTRADO EM ENGENHARIA INFORM ´

ATICA

Especialização em Sistemas de Informação

2011

(4)

(5)

Agradecimentos

Ao longo do meu percurso académico, inúmeras foram as pessoas que me ajudaram, inspiraram ou simplesmente me disseram a coisa certa, na altura certa. De entre essas pessoas, quero destacar a minha mãe, o meu pai, a minha madrinha e os meus avós, por todo o tempo e esforço que investiram em mim. Quero fazer uma referência especial à minha irmã, Ana Rita, por me fazer lembrar que não vale a pena crescer e a Kátia Paramés, Joana Paramés e Rosa Machado, que embora não façam parte da minha fam´ılia, as considero como tal.

Esta tese é também dedicada a todos os amigos que fiz na faculdade, por terem estado ao meu lado nos bons momentos, mas sobretudo nos menos bons. Gostaria de agradecer em particular a João Costa, João Lopes, Louis Philippe, Bruno Seixas, Frederico Mi-randa, Geraldo Nascimento, Diogo Serrano, Christian Marques, Marco Lourenço, Raúl Simpl´ıcio, Davide Nunes, Bruno Correia, Joaquim Tiago Reis, Carlos Alvares, Andreia Machado, Filipe Gil, João Ferreira, Alexandre Gabriel, Vera Conceição e a todos aqueles que, directa ou indirectamente, estiveram relacionados com o LabMag.

Gostaria de agradecer ao professor Paulo Urbano, ao professor Anders Christensen e ao professor Sancho Oliveira por me terem ajudado, de forma incansável e exemplar, a ultrapassar todos os obstáculos que enfrentei durante a escrita desta tese. Foi realmente um privilégio ser orientado por professores tão excepcionais.

Finalmente, gostaria de agradecer a todos os professores da Faculdade de Ciências da Universidade de Lisboa, com quem tive o prazer de ter aulas ao longo destes anos, pelos conhecimentos e valores que me transmitiram. Espero conseguir fazer uso de tudo o que me ensinaram de forma a dignificar tanto a faculdade, como a profissão de engenheiro informático.

(6)

(7)

(8)

(9)

Resumo

A robótica evolucionária é uma técnica que visa criar controladores e morfologias para robôs autónomos, usando técnicas de computação evolucionária, tais como os algoritmos genéticos. De forma semelhante ao principio darwiniano de reprodução do mais apto, o algoritmo genético selecciona os indiv´ıduos mais aptos de cada geração para criar a próxima e assim sucessivamente, até um controlador adequado para a tarefa escolhida seja alcançado.

O principal objectivo deste trabalho é o estudo da emergência de comportamentos colectivos em grupos de robôs autónomos, usando técnicas de evolução artificial para evoluir controladores adequados. A emergência de protocolos expl´ıcitos de comunicação nas experiências é também estudado, com o objectivo de entender a sua influência nos comportamentos que os controladores desenvolveram.

A evolução artificial de controladores pode ser uma tarefa demorada, e a natureza aleatória das primeiras gerações destes pode danificar os robôs (podem surgir comporta-mentos que levem os robôs a embater contra paredes ou a embater uns contra os outros, etc). Para contornar estes problemas, tanto a evolução dos controladores como os seus respectivos testes foram feitos num simulador. Estes são redes neuronais recorrentes com entradas temporais, cujos pesos das ligações sinápticas, tendência e a taxa de decaimento estão codificados em cromossomas. Os cromossomas são criados usando um algoritmo genético e avaliados por uma função de avaliação desenhada especificamente para a tarefa que os controladores terão que efectuar.

O simulador JBotEvolver (http://jbotevolver.sourceforge.net) foi o sumulador utili-zado para realizar as experiências. Este simulador, escrito na linguagem Java, é um pro-jecto de código aberto que permite simular o comportamento de grupos de robôs num dado ambiente. Para efectuar a evolução artificial do controlador de um grupo de robôs, o simulador usa algoritmos genéticos, em conjunto com uma função de avaliação, previ-amente escolhida consoante a tarefa em questão. Um interface gráfico permite ao utili-zador ver uma representação do ambiente, dos robôs e do seu comportamento. Todos os parâmetros da simulação podem ser alterados ao editar um ficheiro de configuração, que é único para cada experiência.

Quando a simulação está a decorrer, um conjunto de ficheiros de dados e de configuração são criados, com os valores de fitness de cada geração, a melhor geração até ao momento

(10)

O simulador pode também fazer evolução artificial de forma distribu´ıda. O lado ser-vidor distribui pedaços de informação para os clientes processarem e enviarem de volta. Esta caracter´ıstica do simulador é extremamente útil, especialmente quando são feitas simulações que exijam uma grande capacidade de processamento.

Os robôs simulados no conjunto de experiências descrito nesta tese são de forma circu-lar, com dez cent´ımetros de diâmetro, duas rodas que se movem de forma independente, com velocidades a variar entre [-5cm/s, 5cm/s] e um conjunto de sensores que variam consoante o tipo de experiência. Um actuador de cor, que permite aos robôs variar a sua cor em todo o espectro RGB é também incluido nas experiências em que se estuda a emergência de protocolos de comunicação expl´ıcitos.

Nesta tese, três temas distintos são abordados pelas experiências: agregação auto or-ganizada, escolha colectiva e gestão de energia. No estudo da agregação auto oror-ganizada, um controlador foi criado através de evolução artificial, para fazer que um grupo de robôs que inicialmente se encontram distribuidos aleatóriamente pelo ambiente, se agreguem num único grupo e se mantenham unidos. Para evoluir este controlador foi utilizada uma população de cinco robôs com sensores que lhes permitiam detectar outros robôs próxims, no entanto para efectuar um estudo de escalabilidade, o controlador foi testado com dez, quinze e cem robôs.

Para o estudo de comportamentos de escolha colectiva, foi criado um ambiente com diversas marcas luminosas. O objectivo era que os robôs encontrassem e escolhessem colectivamente uma dessas marcas formando um único grupo dentro delas. O controlador criado para esta experiências foi evoluido usando um grupo de cinco robôs com sensores proximidade entre robôs e sensores de luz para detectar as marcas luminosas e um am-biente com duas marcas luminosas, a distarem dois metros uma da outra. Estes sensores desenhados para detectar marcas luminosas possuem um alcance de dez cent´ımetros e os robôs iniciam a simulação situados entre as marcas, pelo que os robôs têm que procurar activamente pelas marcas. Para estudar a escalabilidade do controlador produzido, este foi testado com grupos de dez e quinze robôs e num ambiente com cinco marcas lumi-nosas. A influência de protocolos expl´ıcitos de comunicação foi estudada também, ao equipar os robôs com um actuador de cor e sensores que detectam mudanças de cor nos robôs nas proximidades.

Finalmente, para o estudo de comportamentos de gestão de energia, um ambiente com duas fontes de energia foi criado e um grupo de robôs com uma energia limitada foi usado. O objectivo do controlador produzido foi o de manter o maior grupo de robôs poss´ıvel intacto, ou seja sem nenhum chegar a um n´ıvel zero de energia, o máximo de tempo poss´ıvel. As fontes de energia usadas nesta experiência apenas podem carregar dois robôs simultaneamente, sendo que se um terceiro robô se juntar, todos os robôs

(11)

deixam de receber energia. Os robôs utilizados para esta experiência possuem sensores de proximidade entre robôs e sensores que detectam as fontes de energia, bem como o número de robôs que estas se encontram a carregar num dado instante. O controlador foi evoluido usando um grupo de 5 robôs com autonomia de cerca de 80 segundos, durante 120 segundos. Para estudar a escalabilidade do controlador, foram usados grupos de sete e dez robôs e o tempo da simulação foi aumentado para 150 e 200 segundos. O estudo de protocolos de comunicação expl´ıcitos foi também estudado ao adicionar um actuador de cor e sensores que detectam mudanças de cor nos robôs nas proximidades, tal como foi feito na experiência de comportamentos de escolha colectiva.

Palavras-chave: Robótica evoluionária, redes neuronais, algoritmos genéticos, escolha colectiva, agregação auto organizada, gestão de energia.

(12)

(13)

Abstract

Evolutionary robotics is a technique that aims to create controllers and sometimes morphologies for autonomous robots by using evolutionary computation techniques, such as genetic algorithms. Inspired by the Darwinian principle of survival of the fittest through reproductive success, the genetic algorithms select the fittest individuals of each genera-tion in order to create the next one and so forth, until a suitable controller for the desig-nated task is found or for a certain number of generations.

The main goal of this work is to study the emergence of collective behaviors in a group of autonomous robots by using artificial evolution techniques to evolve suitable controllers. The emergence of explicit communication protocols in the experiments is also studied in order to understand its influence on the behaviors the controllers evolved.

Since artificial evolution can be a time consuming task, and because of the random nature of the controllers produced in early generations can damage real robots, a simulator is often used to evolve and test the controllers. The controllers used in this study are Continuous Time Recurrent Neural Networks whose weights of the synaptic connections, bias and decay rates are encoded into chromosomes. The chromosomes are produced by using a genetic algorithm and evaluated by an evaluation function designed specifically for the task that simulated robots have to perform.

The controllers produced through artificial evolution are tested in terms of perfor-mance and scalability. The components of the simulator, such as evaluation functions, environments, experiments, physical objects and so forth are described. Some guidelines of how to create such components, as well as some code examples, are available in the report to allow future users to modify and improve the simulator.

Keywords: Collective evolutionary robotics, artificial evolution, self-organized aggregation, collective choice, resource management, artificial neural networks

(14)

(15)

List of Figures

3.1 Scheme of the robot used in the simulator . . . 13

3.2 Diagram of a neural network used in the experiments . . . 14

3.3 Configuration file and simulator environment . . . 16

4.1 fitness trajectories of all 10 evolutionary runs . . . 21

4.2 Scalability . . . 22

5.1 Environments . . . 26

5.2 fitness trajectories of all 10 evolutionary runs . . . 27

5.3 Followingstrategy . . . 28

5.4 Scoutingstrategy . . . 29

5.5 Average and standard deviation comparison . . . 30

5.6 fitness trajectory of all 10 evolutionary runs . . . 32

5.7 Robots communicating to find a landmark . . . 33

5.8 Scalability study . . . 34

5.9 Result comparison between populations with and without the color actuator 35 6.1 fitness trajectory of all 10 evolutionary runs . . . 39

6.2 Energy management . . . 39

6.4 fitness trajectory of all 10 evolutionary runs . . . 43

(20)

(21)

Chapter 1 Introduction

1.1 Motivation

Since the concept of robot was created by the Czech writer Karel ˘Capek in 1920, people have dreamt of building an intelligent machine able to perform tasks with more efficiency than humans. Nowadays, robots are a reality in our society (even though not in the way that Karel ˘Capek imagined in his play) and robotics is a serious field of research. Two of the branches of robotics research are evolutionary robotics and collective robotics.

One of the main inspiration sources for evolutionary and collective robotics is nature and the way that life evolved in our planet, from unicellular organisms to complex beings, generation by generation. From the vast diversity of living organisms, insects are among the oldest, more numerous and simple life forms present on our planet. However, de-spite they are simple in terms of cognitive capacities they show some complex collective behaviors that allowed them to prosper in a variety of ecosystems.

Several studies have been made regarding complex insect societies and its results are very useful to evolutionary and collective robotics research: insects are among the oldest animals on Earth, and therefore, the modern insects we see today, are the result of a long process of successful evolution. They have complex societies that strive for the survival of the colony instead of the individual. Much of their behavior is therefore for the benefit of the society and decisions are made in a collective distributed way. Also, like a colony of insects is able to establish a nest in a given location and exploit the natural resources available, a swarm of robots should be able to exploit an energy source in the most efficient way.

In this study, I aim to simulate the evolutionary process to obtain a similar collective decision making behavior and also to obtain some resource management behaviors that will allow a group of robots to survive in a given environment. The emergence of explicit communication protocols in the experiments is also studied in order to understand its in-fluence in the behaviors the controllers developed. The robots used in the experiments are controlled by a Continuous Time Recurrent Neural Network (CTRNN) [4] whose

(22)

weights of the synaptic connections, bias and decay rate are encoded in chromosomes. The chromosomes are put of an evolutionary algorithm and each chromosome is evalu-ated by an evaluation function designed specifically for the task the controllers have to perform. Like in the Darwinian principle of reproduction of the fittest, the fittest chromo-somes are selected to create the next generation and so forth, until a suitable controller for the designated task is achieved or for a certain number of generations.

The artificial evolution process can be a very time consuming task so, in order to make it faster and because of the random nature of the early controllers that can actually damage physical robots (evolution can follow a path in early iterations that may lead robots to bump against walls for example), it is simulated by software.

The research in evolutionary robotics can be applied in many fields of study: in robotics, to develop efficient controllers for real-world robots that interact in complex environments, in biology, to study biological neural networks and even to study the the-ory of evolution itself.

Advances in those fields may lead to real-world applications of evolutionary robotics: more efficient and intelligent robots may be developed to explore environments that are too dangerous or unknown for humans, a better understanding of the human brain may be achieved and it may even give us some insights into evolution in general.

1.2 Objectives

The aim of this report is to evolve and study emergent collective behaviors in simulated groups of robots and to study the evolution of communication and its advantages and disadvantages. Since there are an infinite number of real world problems that a controller could be evolved to solve, I focused my research in three of the most basic behaviors a robot must have to perform more complex tasks: self-organized aggregation, collective choice and energy management. Collective choice is arguably one of the most common behaviors present in every group of living beings in nature and plays an important role on the survival of the colony (ants, cockroaches and some species of fishes are a good example).

In order to survive, a group of living beings has also to develop strategies to man-age the resources available in the environment, and that is also an interesting behavior I wanted to study and tried to replicate through artificial evolution. Also, the scalability of the controllers produced is studied to analyze the possibility of evolving controllers in a simple environment, with a small number of robots that can perform well in more complex environments, with larger groups of robots.

(23)

1.3. Contributions 3

1.3 Contributions

There is some remarkable research in this field regarding aggregation, collective choice and language emergence like we can observe in the related work section. In this thesis, the collective choice and the energy management experiments compare populations capable of explicit communication through colors and populations without that ability. The fitness trajectories obtained by both populations are also compared in terms of how fast a good solution is evolved.

Aggregation, collective choice and energy management is studied and I tried to evolve those kinds of behaviors in the simulator, using Continuous Time Recurrent Neural Net-works (CTRNN) [4] and an evolutionary algorithm which will be explained in chapter 3. The experiments involved a population of robots controlled by the same neural network and a set of environments. The environments were designed to study how would ag-gregation, collective choice and energy management behaviors evolve in scenarios with different characteristics. To study simple aggregation, a simple environment with no land-marks was used. Collective choice was studied by adding landland-marks, which the robots could detect at a limited distance, and therefore force the population to develop strategies to find the landmarks and collectively choose one. Communication was then added to the previous problem to study the effect that this ability has in the strategies developed. To study if more complex strategies would emerge, the landmarks were replaced by en-ergy sources and a limited amount of enen-ergy was given to the robots. The enen-ergy sources can charge a limited number of robots simultaneously, so an energy management strategy must emerge to maintain the entire population alive. Like the previous problem, energy management was also studied with communication capable robots. So far I could not find in literature descriptions of experiments where a population of robots had to learn how to manage an energy source using evolutionary robotics as described in this report. If these experiments are not a novelty, they will be among a very few.

1.4 Structure of the document

Chapter 2 is a short historic overview of evolutionary robotics (ER) and its current state of the art, referring some of the most relevant work that has been done so far. Chap-ter 3 describes the simulator used, as well as its main components and the evolutionary algorithm used. It also explains how to create new components, such as new sensors, actu-ators, environments and so forth. Chapter 4 describes the experiment about self-organized aggregation, chapter 5 describes the collective choice experiments with and without pop-ulations capable of communicating through a color actuator and chapter 6 describes the energy management experiments, again with and without populations capable of commu-nicating through a color actuator. There will be a total of five experiments, presented from the most simple to the most complex scenario. Since the neural networks begin with

(24)

ran-dom values, a remarkably good or bad controller can be achieved and it will not provide good data about the effectiveness of the evaluation function used. To mitigate this fact, every experiment was run 10 times (evolutionary runs) with random initial seed numbers.

Each experiment will be presented as follows:

* Purpose of the experiment in the context of my work. * Experimental setup used

* Description of the fitness function used * Presentation of the results

* Discussion

The controllers produced are also tested in slightly different conditions than the ones used in their evolution by varying the number of robots used and/or the number of land-marks/energy poles present in the environment, so that characteristics like scalability can be studied.

Finally, in chapter 7 some conclusions are drawn regarding the experiments per-formed, a comparison between the goals set in the preliminary report and the goals achieved in the final report and some future work is suggested. Appendix A to G of-fers code examples of the various components of the simulator and also some guidelines of how to create them. An explanation and an example of the configuration files are also present.

(25)

Chapter 2 Related work

2.1 Historic overview

The foundations of evolutionary robotics can be tracked back to late 50’s, when the evo-lutionary computation concept was developed [2], [14], [13]. Nevertheless, evoevo-lutionary computation field remained relatively unknown to the broader scientific community for about thirty years [2].

The situation changed during the 80’s, when computational power available started to be sufficient to conduct some experiments and to solve real-world optimization problems. In 1985, a number of international conferences on techniques, such as genetic algorithms and its theoretical aspects and evolutionary programming, were held, but it was not un-til the 90’s the researchers in the different fields of evolutionary computation started to cooperate [2].

It was on one of those conferences in the 90’s, the national council in Rome, the foundations of evolutionary robotics were laid. In 92 and 93, Floreano and Mondada at the EPFL in Laussane, Switzerland, and a research group at COGS, at the university of Sussex, reported experiments on artificial evolution on autonomous robots [36]. The results were promising, and that triggered a wave of activity in labs around the world.

Lately, due to the difficulty to scale-up the complexity of the robots tasks, research started to focus more on the theoretical questions of the field, rather than the engineering challenges.

2.2 Evolutionary collective robotics overview

The primary goal of evolutionary robotics (ER) is to develop methods for automatically synthesizing intelligent autonomous robot systems [35]. Although most of research in ER is about controllers, EA is also applied in the creation of robot morphologies [26], but this topic will not be discussed further in this report.

Evolutionary robotics is not depending on research about nature but it uses some of its 5

(26)

results: there are some remarkable works about collective choice regarding cockroaches ˜O [22] and ants [10]. Garnier et. al. [16] studied aggregation behaviors in cockroach larvae. In this study, different aggregation behaviors were observed, such as wall following and two different resting behaviors. The rest time in a single shelter has been proved to influ-ence in which shelter cockroaches (Blatella) and ants (Oecophylla) will aggregate [10]. Jean-Mark Ame et al. [1] took this research a step further and developed a mathematical model to reproduce these collective responses, based on the relation between the resting period in a given site with the number of individuals. Studies about swarms in nature influenced collective robotics: there has been some attempts to mimic aggregation behav-iors, as described in [15], where a group of Alice [6] micro-robots were used to perform the task. An even more realistic scenario was proposed by [24], where flocking behaviors were evolved in ecosystems consisting of plants, herbivores and predators.

Even though nature is a main source of inspiration, collective robotics also take some more articial approaches to perform aggregation and cooperation behaviors. [29] studied seeded aggregation and flocking using sound signals (chorusing). A similar experiment was proposed in Evolving mobile robots able to display collective behaviors [3], where robot controllers were evolved to make the robots aggregate and move together toward a light target. In the experiments described in this paper, the controllers evolved several strategies to perform the given task. Using a probabilistic model, N. Correll and A. Mar-tinoli [7] showed how Alice micro-robots could aggregate and a more simple approach for aggregation and flocking was taken by C. Moeslinger, T. Schmickl, and K. Crailsheim [33], by describing a simple flocking algorithm to be used with very simple swarm robots which works without the need for communication, memory or global information. Evolv-ing controllers for a homogeneous system of physical robots: Structured cooperation with minimal sensors[38] showed how a group of 3 robots with minimal sensor capabilities could aggregate and move as a group using simulator evolved controllers. Remarkably, the team members dynamically adopted roles to make the group movement easier. V. Trianni et al. [44] describes how it is possible to use artificial evolution to find simple solutions to the aggregation problem, mainly by exploiting some invariants present in the environment and the complex interactions among the robots and between the robots and the environment.

An important aspect of a society is how information is transmitted between its mem-bers [28]. The individuals that form a society can be very complex (like humans), or very simple (like bees or ants) in terms of cognitive or sensorial capabilities, but they will still have the need to communicate in order for the society to survive. The complexity of the organisms will influence the way communication is performed: humans are arguably the most complex known organism and we developed a very complex form of communica-tion, and on the other hand, ants rely on very simple strategies, like the deployment of pheromones to mark a path for food [21]. Research in evolutionary robotics has been

(27)

2.2. Evolutionary collective robotics overview 7

trying to make communication emerge among populations of simple robots in terms of cognitive and sensorial capabilities so they can accomplish tasks in a more efficient way. The emergence of communication might seem very unlikely, because the robots need to produce signals that are useful and react to useful signals. However, like Maynard Smith said ”It’s not good making a signal unless it is understood and a signal will not be under-stood the first time it is made” [27], which means that the adaptive variations that led to the production of useful signals or to the exploitation of the same signals are adaptively neutral, unless they are developed at the same time. Nevertheless, experimental results showed that communication protocols can emerge in some cases.

One of the earliest attempts to make communication emerge using artificial organisms can be found in the G. Werner and M. Dyer paper [45]. The aim of this work was to evolve simple communication protocols for mate finding: female individuals had the ability to see males and emit sounds and male individuals were blind but able to perceive the sounds emitted by females. The results showed a progression of generations that showed increas-ingly effective mate-finding strategies. It also showed a number of groups with different signaling protocols. Angelo Cangelosi and Domenico Parisi performed some experiments where neural networks living in an environment evolved a simple language to help other individuals to decide whether a mushroom is edible or poisonous [37]. A similar experi-ment was performed in an environexperi-ment with a food source and a poison source with robots capable of emitting a blue light [32]. The robots had to find a way to warn the rest of the population when they find a food source or a poison source. The way the population was evaluated proved to be important in the final behavior. When selection was made at an individual level, the robots started to emit deceiving signals so they could exploit more efficiently the food source. The communication protocol the robots developed was not uniform in all the experimental runs: in some cases, the food source was signaled by the robots by emitting the blue light and in other cases the robots signaled the poison with the blue light. The question of how a population of initially non-communicating robots might develop communication skills that are functional with respect to the task that the robots have to perform without being rewarded for communication is addressed in Evolution of implicit and explicit communication in mobile robots [9]. In this article, teams of 2 e-puck robots [34] with neural networks evolved in a simulator had to find 2 landmarks and switch between them as often as possible. The robots were able to communicate through a bluetooth connection but they were not rewarded for that. Results showed that popula-tions able to explicitly communicate developed a communication protocol to help them to coordinate and cooperate. The robots also developed 2 different strategies to perform the task, which are thoroughly explained in the article.

More recently, some open challenges regarding the evolution of language in embodied agents were pointed out and some methods to address those challenges in a correct way were also established [30] [43]. The strategic aspects of communication are also

(28)

impor-tant factors to take in consideration in order to better understand and guide the evolution of communication protocols [18]. A solution to the problem of honest communication among agents (which was faced in [32] and [31] for example), is presented: costly sig-nalling. The main idea behind this concept is that if a signal is costly, the simple fact that the signal is present reveals something of value, even when in the presence of a deceiving signal. Some examples of costly signalling can be found in our society: Ownership of a Rolls Royce is a credible signal of wealth because only a wealthy person can afford one.

Societies in nature also have to develop strategies to manage the resources available. There has been lots of research about resource management, which are out of the scope of this report, but arguably the most famous is the Tragedy of the commons [19]. In this article, it is shown that the efficient management of resources by a growing population is impossible. The example given in the article shows how a profitable situation for an individual can be harmful for the society. There has been some research regarding this question like in Autonomous robots sharing a charging station with no communication: a case study[41] adapted to ER, where a group of self-sufficient mobile robots share a charging station using simple mechanisms or in Market-basedcoordinationofrecharg- ing robots[25] where market-based coordination techniques are used to effectively allocate recharging tasks to maximize the productivity and efficiency of the team. However, ar-tificial evolution is not used in both cases. Surprisingly, it seems that the emergence of behaviors that would allow a group of robots to be self-sufficient has never been studied thoroughly.

All the experiments described above show that artificial evolution can be successfully applied to synthesize effective controllers. More than that, artificial evolution can develop controllers that would be nearly impossible to achieve if explicitly programmed ([3].

(29)

(30)

(31)

Chapter 3 The simulator

3.1 Simulation environment

To perform the experiments, we are using the JBotEvolver (available at http:// jbote-volver.sourceforge.net).

This simulator, written in the Java language, is an open source project that allows us to simulate the behavior of a population of e-puck like robots in a given environment. The number and type of actuators and sensors available on the robots are customizable, which means that we can simulate any given type of sensor and actuator.

JBotEvolver can conduct artificial evolution of a controller of a population of robots, in a given environment. To do that, the simulator relies on neural networks and evolution-ary algorithms, along with an evaluation function previously chosen. A graphical user interface allows the user to see a representation of the environment, the robots and their behavior. Settings can be chosen by editing a configuration file, which will be unique for each experiment we conduct.

When the simulation is running, several data files and configuration files are created, regarding fitness values of each evolution, the best generation so far and the generations previously generated. This way it is possible to evaluate the evolutions results of a given setup.

The simulator can also perform evolution in a distributed manner. The server side of the simulator will distribute chunks of information for the clients to process and send back the results. This is a very useful feature to perform evolution with very complex experimental setups that require a lot of processing power.

3.2 Components of the simulator

3.2.1 The robots

The simulator can handle multiple types of robots, which are defined by the user. A robot is, in its core, a set of sensors, actuators and pre-programmed behaviors. However, it

(32)

is possible to create more complex robots by extending the Robot class. The robots are customizable, which means that the number and type of sensors and actuators and the size of the robots can be defined by altering the configuration file. Sensors and actuators can be also programmed so the number of setups and types of robots the simulator can handle is vast. An example of a configuration file can be found on Appendix B.

To perform the experiments described in this document, a differential drive, cylindri-cal robot model with 10 cm of diameter was used. The actuators are the two wheels that allow the robot to move and an RGB actuator used as a communication means, when communication is used. The wheels can be controlled independently allowing the robots to steer and rotate and the RGB actuator can produce colors in the entire RGB color spec-trum, so the controllers can evolve a language to communicate. An example of an actuator and some guidelines of how to create new actuators can be found on Appendix E.

Each robot is also equipped with a variety of sensors that will detect other robots nearby, the landmarks used, the energy poles and the robot’s internal energy. Each sensor detects a specific type of object: another robot, the landmark object and the energy pole. Except the internal energy sensor, all sensors are disposed in the robot’s body, as shown in figure 3.1, and have an opening angle of 135◦, so the sensors can cover enough space and give the controller accurate information of the location of the physical objects (both distance and relative position with respect to the robot). The range varies according to the experiment needs. If there are no sources within the sensor’s range and opening angle, its readings will be 0, otherwise the readings will be based on the distance to the closest source ( c ) according to the following equation:

s = range − dc range (3.1) or s = range − dc range ! · τ, (3.2)

in the case of the energy pole sensor, where range is the sensor’s detection range, dcis the

distance between the closest source c and the sensor and τ is the energy level, normalized between 0 and 1, of the energy pole. To get results that are closer to the ones expected if the controllers were to be used in the real world, the sensors also have a noise factor. An example of a sensor and some guidelines of how to create new sensors can be found on Appendix D.

(33)

3.2. Components of the simulator 13

Figure 3.1: Scheme of the robot used in the simulator

3.2.2 The controllers

Controller classes can be created to control the robots, which means that the robots can use multiple techniques to navigate in the environments, from simple state machines to neural networks.

In the experiments described in this document, the robots are controlled by a Contin-uous Time Recurrent Neural Network [4] which consists of three layers of neurons: an input layer with N neurons, depending on the number and type of sensors used, a hidden layer with 5 neurons and an output layer with 2 or 3 neurons depending whether the robots are using the color actuator or not. The input neurons Ii are connected to the sensors, so they will react to other nearby robots, landmarks, energy poles or internal energy level. The neurons in the hidden layer are fully connected and are governed by the following equation: τi dHi dt = −Hi+ N X j=1 ωjiIi+ 5 X k=1 ωkiZ (Hk+ βk) , (3.3)

where τi is the decay constant, Hi is the neuron’s state,ωji the strength of the synaptic

connection from neuron j to neuron i, β the bias terms, and

Z(x) =1 + e−x−1 (3.4)

is the sigmoid function. τi, β and ωji are genetically controlled network parameters. Cell

potentials are set to 0 when the network is initialized or reset and circuits are integrated using the forward Euler method with an integration step-size of 0.1.

The output layer is fully connected to the neurons in the hidden layer. The activation of the output neurons is given by the following equation:

Oi = N X j=1

(34)

Each output neuron is connected to an actuator and it will be responsible for its be-havior. Outputs O1and O2can control left and right wheels respectively, and their output

is linearly mapped to speeds between [-50 cm/s, 50 cm/s]. Output O3 controls the color

of the robot when the color actuator is used.

All robots have instances of the same controller so the population is homogeneous. Figure 3.2 shows a diagram of a neural network used when the robots used have proximity, landmark and color sensors, wheels and color actuator.

Figure 3.2: Diagram of a neural network used in the experiments

3.2.3 The evolutionary algorithm

Evolutionary algorithms (EAs) are computer programs that attempt to solve problems by using the Darwinian concept of evolution [8]. In EAs, individuals are represented by a vector that encodes a single possible solution to the problem. The EA starts with a random initial population of µ individuals. Each individual is evaluated by a fitness function and then assigned a fitness value. Individuals with higher fitness values are considered to be better solutions and thus used to produce (and sometimes to be part of) the next generation and so forth, until a goal fitness is achieved or a limit number of generations is reached. The next generation is produced by using mutation, recombination and/or crossover operators. The simulator refers the evolutionary algorithm as the population. Population classes are in the evolutionaryrobotics.populations package and extend the Populationabstract class.

Evolutionary algorithms have different implementations [23]. There are three main implementation instances of EAs: Genetic algorithms, developed by Holland [20] and thoroughly reviewed by Goldberg [17], evolution strategies (ESs) developed by

(35)

Rechen-3.2. Components of the simulator 15

berg [39] and Shwefel [40] and evolutionary programming (EP) developed by L. J. Fogel et al. [12] and refined by D. B. Fogel [11].

In the set of experiments that will be described in this report, a genetic algorithm was used to perform evolution ([µ, λ] algorithm). The algorithm uses generations that consist of 100 chromosomes each (λ). The chromosomes encode the weights of the synaptic connections, bias and decay constant of the neural network. The topology of the neural network is not subject to evolution, but hand designed and defined globally and externally. The fitness of each chromosome is sampled (which means the chromosome is tested a number of times and the final chromosome fitness will be the average of the fitness values achieved in each sample) and the 5 best are chosen (µ) to create the next generation, discarding the remaining ones. Each one of the 5 chosen parents will create 19 children and finally, both parents and children are used to create the next generation. The genotype of the children is obtained by adding a random Gaussian offset to each gene, with a 15% of probability (mutation).

Parameters of the evolutionary algorithm used, such as chromosome number, number of time steps, number of generations and mutation rate can be set in the population section of the configuration file. Different evolutionary algorithms can also be created and used.

3.2.4 Evaluation functions

To evaluate the performance of the chromosomes of each generation, each experiment has an evaluation function. The evaluation function scores the chromosomes with a fitness value. The fitness of each chromosome is sampled a number of times during a number of control steps defined in the configuration file, and selection is based on the average fitness obtained. The fitness value is always calculated at group-level with an evaluation function.

The evaluation functions used will be described in detail in each experiment descrip-tion. See appendix A for an example of an evaluation function and some guidelines of how to create new ones.

3.2.5 The environments

The environment is the virtual place where the simulation takes place. It is chosen by the user in the configuration file, as well as its parameters. An environment has a set of physical objects and rules to govern the interactions between them and the robots, such as collisions. The number, size, placement and so forth of the physical objects is also defined in the environment.

The environments may or may not be static: it is possible, for example, to make the number, size and/or position of a given physical object vary in each sample by adding a parameter in the environment section of the configuration file. Figure 3.3(a) shows an

(36)

example of an environment and figure 3.3(b) shows an example of a configuration file.

(a) Energy Management environment (b) Configuration file example

Figure 3.3: Configuration file and simulator environment

3.2.6 Physical objects

Physical objects are all the objects present in the environment where the evolution will occur. Everything that the robots can interact with (besides the robots themselves) is a physical object. New physical objects can be created, with their rules of interaction, in the simulator An example of a physical object and an explanation of how to create one can be found on Appendix C.

3.2.7 The experiments

The experiment classes define the rules of the experiment, like the way the robots will be placed in the environment, the types of robots the experiment will have (enemy and friendly robots for example will have different setups and those setups will be defined here) or even if the number of robots can vary in each sample and how is that variation made. The environment is also created in the experiment class, as well as the chromosome and the controller. If the controller is a neural network, its weights are also defined here. An example of an experiment class and some guidelines for creating new ones can be found on Appendix G.

3.3 Contributions to the simulator

Some functionalities have been developed to the JBotEvolver simulator but not all have been used to perform the experiments described in this report. Mainly this is due to poor or inconclusive results that the experiments developed achieved. However, I believe they

(37)

3.3. Contributions to the simulator 17

may be useful in the future. Among the new functionalities for the robots we can highlight the Near robot sensor which detects nearby robots, an artificial pheromone to mimic the ants communication, as well as the respective actuator (the mechanism that places the pheromone in the environment) and sensor.

Two new environments were created, namely the Landmark and the Manage environ-ments. In the Landmark environment, landmarks, which the robots can detect, are placed. The type of placement can be regular, which means the landmarks will be placed along a circumference and at a constant distance from each other, or random, where the landmarks will be randomly placed in the arena without overlapping. The size of the landmarks can also be customized as well as the distance that separates each landmark. The Manage environment works in a similar manner, but the objects that it contains are charging poles. This environment has all the customization options available in the Landmark environ-ment and it also allows to define the number of robots each landmark can recharge at a time, the initial energy of the robots, their energy consumption per time unit and their energy gain when recharging per time unit. Both environments can also change during simulation: it is possible to change the number and placement of the landmarks or charg-ing poles available on each simulation run or sample. The way the number and placement of the objects varies is constant: for example, if the simulation has 4 samples, the objects will be placed in 4 fixed (randomly assigned) different places, and the number of object will increment by adding the original number of landmarks 4 times. That will allow the controllers to evolve in a set of different environments, thus making the controllers more general.

A new experiment was also created, called Aggregation. In this experiment, the num-ber of robots can vary in the same way the numnum-ber of objects present in the environment. Several evaluation functions were created to evaluate the controllers, namely the Cen-terOfMassand CenterOfMassAndClusters (which counts how many clusters of robots are presents at any time) to evaluate self-organized aggregation, Recruiting to evaluate col-lective choice and Survival to evaluate energy management in the Energy management experiments.

This report also explains in detail how to create sensors, actuators, experiments, envi-ronments, evaluation functions and configuration files to the simulator so it can be used as a user manual for the simulator.

(38)

(39)

Chapter 4 Self-organized aggregation

4.1 Introduction

Aggregation is one of the most pervasive behaviors in nature [5]: fishes aggregate so they can look like a larger animal to potential predators, some insects, like the cockroaches aggregate in valuable locations to survive and establish their societies [46] and so forth. Life would not be possible without aggregation phenomena: atoms aggregate and form new elements due to the weak nuclear force, thus creating the building blocks of life. Aggregation behaviors can be enhanced by environmental clues, such as light, humidity or food abundance to name a few. However, aggregation can also be self-organized: cockroach aggregation behaviors result from an emergent cooperative decision [42]. The study of aggregation and cooperative decision is important to swarm robotics: usually, swarm robotics uses simple robots that have to cooperate and that cooperation usually requires the robots to be close to one another. Also, aggregation is an important behavior for swarms of robots to self-assemble or perform pattern formations.

This experiment aims to evolve a self-organized aggregation behavior. The robots must form a single cluster and that cluster should be stable (i.e. should not have robots leaving the cluster once they are in). The place the robots aggregate is not important for this experiment, so the cluster can be formed anywhere in the environment.

4.1.1 Experimental setup

For this experiment, I used a population of 5 differential drive robots, with 8 robot prox-imity sensors. The opening angle of the sensors is 135◦and the sensor range is 5.0 meters. 10 evolutionary runs were performed with this setup with different initial random seeds for 100 generations each. The fitness of each genome was sampled 4 times in trials of 2 minutes of virtual time (1200 control steps) each. In each sample, the robots are placed at a random position, so evolution cannot use the starting point of the robots to enhance aggregation, also the bias values in the robot proximity sensors vary.

(40)

The environment used to perform the experiments is an infinite space, with no land-marks or walls, so the only reference the robots will have will be other robots nearby.

4.1.2 Evaluation function

To evaluate the controllers produced by the evolution, I used an adaptation of the approach described by Trianni et al.[44]: the evaluation function checks the coordinates of all robots and calculates the center of mass, which means that coordinate will always be closer to where the majority of the robots are. The fitness value represents the average distance of the group from its center of mass multiplied by a factor, added to the sum of the neighbors of each robot, multiplied by a factor:

fs= ω X t=0 1 n 10 11 · Pn i=1 1 −di(t) δ +₁₁1 ·Pn i=1 _ρ(i) n∗(n−1) ω (4.1)

where fsrepresents the fitness of a sample, n is the number of robots used, t is the number

of the current control step, di(t) is the distance of the ith robot to the center of mass, δ is

the sensor range of the robots, ρ(i) is the number of neighbors of the ith _{robot and ω is}

the number of steps per run. The distance di(t) varies between 0 and the maximum range

of the sensor used. The fitness is calculated at group level and varies between [0, 1]. The final fitness value of a chromosome will be the average of all fitness values obtained in each sample.

4.1.3 Results

In all 10 performed evolutionary runs of the experiment, the robots managed to aggregate in a very efficient way: as we can observe in fig 4.1, which shows the fitness trajectories of all evolutionary runs, the values obtained are close to 0.9 which represents a quick aggregation. The fitness trajectories also show that an optimal controller was reached early on the evolution process.

Even though there is elitism in the evolutionary algorithm used, the fitness can still drop from generation to generation, as observed in the fitness trajectory. This is due to the fact that a random seed is used in each sample, which will define sensor noise and initial position. The random seed can be such that will make the robots perform worst (i.e. a slightly higher sensor noise). For that reason, each experiment is also performed a number of times, or evolutionary runs, so the results obtained are accurate and independent from the random nature of the initial position of the robots, sensor noise and the first generation. Ten samples of the last generation of each evolutionary run were run to observe the strategies the robots developed to aggregate: the robots simply converge to the popula-tion’s center of mass. When a single cluster of robots is formed, the cluster will wander through the arena. The cluster is cohesive but it will never stop moving, and that is the

(41)

4.1. Introduction 21

Figure 4.1: fitness trajectories of all 10 evolutionary runs

reason for the collective movement observed. The cause of this behavior is the fact that every robot will try to get as close as the population’s center of mass as possible. Since the center of mass is not an environmental cue (and therefore the robots are unable to detect it), the robots will first form groups with the closest robots that will merge with other groups and so forth, until a single cluster is formed. The robots do not have the notion of group size, so in order to ensure that a single cluster is formed, the movement observed in clustered robots is of great help. The behavior described is also important for the scalability of the controllers evolved: the robots will be always searching for other robots while maintaining the cluster so there will be no limitation in terms of number of robots that can aggregate

Scalability

To study scalability, and confirm that the solution achieved to perform self-organized aggregation is highly scalable, the best generation of each evolutionary run was tested with 5, 10, 15 and 100, 10 samples each, and the results is the average of the fitness values obtained.

Figure 4.2(a) compares the fitness values achieved by all 10 evolutionary runs when tested with 5, 10, 15 and 100 robots and figure 4.2(b) shows the average fitness value and standard deviation.

(42)

close to 0.9, which means the solution achieved by evolution using 5 robots can aggregate different sized groups of robots.

To study how the controller would handle extremely large groups of robots, a test with a population of 100 robots was performed 10 times on the last generation of each evolutionary run. The results show a drop on the fitness values, which can due to the cluster size: the bigger the population, the bigger the cluster diameter, so the robots at the outer parts of the cluster, despite they are in the cluster, they are far from the center of mass thus reducing the final fitness value. Another reason is the time the robots take to form the cluster when the population is large: the cluster formation starts with small clusters that merge until a single cluster is formed. The larger the population, the more time it will take to form a single cluster, thus reducing the final fitness value also.

(a) Scalability comparison

(b) Average and standard deviation

Figure 4.2: Scalability

4.1.4 Discussion

In this chapter, self-organized aggregation behaviors were evolved, evaluated and tested. The results show that with the evaluation function and the robot setup used, controllers can

(43)

4.1. Introduction 23

be successfully evolved to perform self-organized aggregation. The controllers produced proved to be efficient and scalable.

However, some issues can be pointed out. The evaluation function does not take in consideration the diameter of the final cluster, thus the final fitness values of large populations will not be as accurate as with small populations. During my work, I tried to mitigate this problem, without success, by developing a formula where the center of mass would be a circle instead of a point in the arena. That way, every robot within the center of mass circle diameter would have a distance value of 0 and thus maximizing the fitness value. As future work, developing a more precise way to score large populations performing self-organized aggregation will be, in my opinion, very useful. The fact the robots have a very large sensor range can also bias the results: the robots don’t need to find their neighbors as they are all within sensor range. The time the experiments take to be completed was a limiting factor that made me make the decision of not study self-organized aggregation any further. However, a relation between fitness achieved and sensor range could be a good way to continue the study of self-organized aggregation.

My next step was to study the evolution of collective choice behaviors. How will evolution develop controllers capable to choose a landmark and aggregate on it? The next section describes how collective choice behaviors were evolved and the strategies that emerged.

(44)

(45)

Chapter 5 Collective choice

5.1 Collective choice without communication

The next experiment aims to evolve collective choice behaviors. Collective choice behav-iors are present in several societies, such as insect societies, and those behavbehav-iors are the main source of decision. In fact, the strength of this kind of societies relies on the fact that the group interest is always above the individual interest. Members of this kind of societies often have very limited cognitive and communication capabilities, so decisions have to be made collectively. To do so, societies have to develop some strategies to ex-plore the environment and make decisions accordingly like if the society was, in fact, a single organism.

In this set of experiments, I evolved some collective choice behaviors, based on the way a group of cockroaches, which are limited in terms of cognitive and communication capabilities, can collectively choose a shelter [10]. The robots must cluster in one of the landmarks placed in the arena. The sensor range is very limited, so the robots must adopt a strategy to first find a landmark and then cluster inside.

5.1.1 Experimental setup

For this experiment, I used a population of 5 differential drive robots, with 8 robot prox-imity sensors, and 8 landmark sensors. The opening angle of the sensors is 135◦ and the sensor range is 5.0 meters for the proximity sensors and 0.1 meters for the landmark sensors, which means that the robots will not detect the landmarks when the simulation starts, but they will have to actively search for them. 2 landmarks are used, with 0.30 me-ters of diameter and at 2.0 meme-ters from each other. Fig 5.1(a) shows how the landmarks were placed to perform the evolution. The number and position of the landmarks never changes during the simulation, however, to study scalability, a 5-landmark environment is also used. In this case, the landmarks are placed in a circle like shown in fig. 5.1(b) 10 evolutionary runs were performed with this setup with different initial random seeds for 50 generations each. The fitness of each genome was sampled 4 times in trials of 2

(46)

minutes of virtual time (1200 control steps) each.

In each sample, the robots are placed at the center of the environment so a robot will never begin close, or even inside a landmark, which would over simplify the experiment since a landmark could already be found by a robot before the beginning of the exper-iment. The bias of the sensors and the way the individual position of the robots in the initial cluster varies in each sample.

(a) Simulation environment used to perform evolution

(b) The 5 landmark environment

Figure 5.1: Environments

5.1.2 Evaluation function

To evaluate the controllers produced, I developed an evaluation function that first calcu-lates the nearest landmark from the population’s center of mass and then averages the

(47)

5.1. Collective choice without communication 27

distance between each robot and the landmark calculated before. The final fitness value of a chromosome will be average of the fitness values obtained in each sample.

fs(t) = t X t=0 1 n Pn i=1 1 − di(t) δ ω (5.1)

where fsrepresents the fitness of a sample, n is the number of robots used, t is the number

of steps of each generation, di(t) is the distance of the ith robot to the nearest landmark

of the population’s center of mass, δ is the sensor range of the proximity sensor and ω is the number of steps per run. The distance di(t) varies between 0 and the maximum range

of the sensor used. The fitness is calculated at group level and varies between [0, 1].

5.1.3 Results

In all 10 performed evolutionary runs of the experiment, the robots managed to find a landmark and aggregate in it in a very efficient way: as we can observe in fig 5.2, which shows the fitness trajectories of all evolutionary runs, the values obtained are close to 0.9 which represents a quick aggregation. The fitness trajectories also show that an optimal controller was reached early on the evolution process.

Figure 5.2: fitness trajectories of all 10 evolutionary runs

The strategies adopted by the populations to find and aggregate inside a landmark are very simple but effective. I was able to distinguish 2 different strategies, which I will call Followingand Scouting.

In the Following strategy, all the robots will split the initial group and start to move in circles, with a variable radius, until a landmark is found by one of them. When a robot

(48)

finds a landmark, it will stop. This will create a chain reaction: the robot right behind the one who found the landmark will reach the landmark and stop and so forth until all robots are inside the landmark. Figures 5.3(a), 5.3(b) and 5.3(c) show the process of finding a landmark and clustering in it.

(a) The robots move in a circle (b) The landmark is found

(c) The robots cluster

Figure 5.3: Following strategy

In the Scouting strategy, only a small number of robots will leave the initial cluster and start to move in circles. The remaining robots will move also, clustered, according to the scout robots positions: the group will never be too far away from the scouts and the scouts will, from time to time, join the cluster. If a robot finds a landmark, it will stay inside, thus pulling the other robots inside too. Figures 5.4(a), 5.4(b) and 5.4(c) show the process of finding a landmark and clustering in it.

(49)

5.1. Collective choice without communication 29

(a) The scout finds the landmark (b) The remaining robots follow

Figure 5.4: Scouting strategy

Scalability

To study scalability, the controllers achieved by each evolutionary run were tested the 2 and 5 landmark environments, each one tested with 5, 10 and 15 robots, in a total of 6 tests. Each test was performed 10 times, with different initial seed numbers and the results are the average fitness values obtained.

The first scalability test shows the behavior of the controllers in an environment with 2 landmarks. Figure 5.5(a) and figure 5.5(b) shows a significant drop of the achieved fitness value of the controllers when the number of robots is increased. This drop is caused when the robots find the 2 landmarks and form 2 different clusters or when the population takes long to find a landmark and cluster. However, when just 5 robots are used, they managed to find and cluster inside a landmark in all of the evolutionary runs. By observing the robot’s strategies, I concluded that the controllers who developed a scouting strategy scale better when the number of robots is increased. evolutionary runs 2, 6 and 8 are the ones who developed a more clear scouting behavior and therefore the ones with better overall results.

The scalability test with 5 landmarks obtained similar results as we can observe in figures 5.5(c) and 5.5(d). Except the controller on evolutionary runs 1 and 5, all controllers managed to find and cluster in one of the 5 landmarks of the environment. In terms of number of robots, scalability is slightly worse than in the 2-landmark environment,

(50)

however, evolutionary runs 2, 6, 8 and 10 managed to find and cluster in a landmark efficiently. Once again, those controllers show a scouting strategy, proving once again its superiority in terms of scalability.

(a) 2 landmarks with 5, 10 and 15 robots (b) Average and standard deviation, 2 landmarks

(c) 5 landmarks with 5, 10 and 15 robots (d) Average and standard deviation, 5 landmarks

Figure 5.5: Average and standard deviation comparison

5.1.4 Discussion

In this set of experiments, some collective choice behaviors were observed. The robots adopted 2 main strategies, Following and Scouting, to successfully find a landmark and aggregate in it. The Scouting strategy seems to be superior. This can be explained by the fact that only a limited number of robots will leave the initial cluster, where they begin the simulation, to search for a landmark. On the other hand, the Following strategy relies on brute force to find the landmarks, which means spreading the population as much as possible, hoping some robot will find a landmark. However, when the populations are bigger, several robots can find different landmarks, thus forming a cluster per landmark found.

The results presented in this experiment showed that between 30% and 40% of the evolutionary runs developed a Scouting strategy, but more evolutionary runs would be necessary to verify this success rate. However, this study shows that it is possible to evolve a controller general enough to solve other problems than the ones faced during

(51)

5.2. Collective choice with communication 31

evolution. As future work, an evaluation function capable to raise the percentage of con-trollers showing a Scouting strategy would be interesting.

Insects capable of making a collective choice differ in a very important aspect from the robots used to perform these experiments: they can communicate. So the questions raised when analyzed these results are: will communication between robots emerge? And if so, will it improve the results?

5.2 Collective choice with communication

The purpose of this experiment is to study the emergence of communication and its in-fluence of in the collective choice behaviors. The robots now have a channel to commu-nicate, with a color actuator that can take any color in the RGB spectrum, which other robots will detect.

In this experiment, evolution will create strategies to choose a landmark and also a language the robots will use to communicate. Since communication is not the main goal, the controllers will not be scored based on the way communication is done, but rather if they can accomplish the task or not, which is to collectively choose a landmark and cluster in it. The configuration of the robots used in this experiment is similar to the setup used in section 5.1: the only difference is the usage of a color sensor and actuator as a communication means.

The results of this experiment will be compared to the results obtained in the experi-ment showed on section 5.1 and some conclusions about the usefulness of communication in this particular problem will be drawn.

5.2.1 Experimental setup

For this experiment, I used a population of 5 differential drive robots, with 8 color sensors, 8 landmark sensors and 8 robot proximity sensors. The opening angle of the sensors is 135◦ and the sensor range is 5.0 meters for the color sensors and robot proximity sensors and 0.1 meters for the landmark sensors. An additional actuator is also used to make the robot change colors. The color actuator can take any color in the RGB spectrum and the color will be used to communicate. 10 evolutionary runs were performed with this setup with different initial random seeds for 50 generations each. The fitness of each genome was sampled 4 times in trials of 2 minutes of virtual time (1200 control steps) each. 2 landmarks were used, with 0,30 meters of diameter and at 2.0 meters from each other. The number and position of the landmarks never changes during the simulation, however, to study the scalability of the controllers produced, a 5-landmark environment is also used. In this case, the landmarks are placed in a circle like shown in fig. 5.8(b). The bias of the sensors and the way the individual position of the robots in the initial cluster varies in each step.

(52)

The evaluation function used is the same used in section’s 5.1 experiment, so the communication itself will not contribute to the overall fitness.

5.2.2 Results

Fig 5.6 shows the fitness trajectory of all 10 evolutionary runs. As we can observe, like in the experiment described in section 5.1, controllers able to perform the task emerged in early generations and by the last generations, the fitness values reached values near 1.

Figure 5.6: fitness trajectory of all 10 evolutionary runs

The main difference between the results in this experiment and the experiment de-scribed in section 5.1 is in the strategies developed. The Scouting and Following behaviors not always emerged, instead the robots had more erratic behaviors. The reason for this phenomenon is in the fact that some generations evolved a form of communication, as we can observe in figure group 5.7, to notify the other robots when they find a landmark, so strategies based on predictable movements, like Following and Scouting are less probable to emerge.

(53)

5.2. Collective choice with communication 33

(a) The robots search for a landmark

(b) The landmark is found

Figure 5.7: Robots communicating to find a landmark

evolutionary runs 5, 6, 7 and 10 did not develop a communication protocol. Since the color sensor works like the proximity sensor when it detects a color in another robot, some evolutionary runs adopted strategies similar to the ones adopted by previous experiment, where the robots couldn’t communicate by changing colors.

Experiments in evolutionary collective robotics

U

NIVERSIDADE DE

L

ISBOA

Faculdade de Ciˆencias

Departamento de Inform´atica

EXPERIMENTS IN EVOLUTIONARY COLLECTIVE

ROBOTICS

Andr´e Gonz´alez Amor de Bastos

MESTRADO EM ENGENHARIA INFORM ´

ATICA

Especialização em Sistemas de Informação

2011

U

NIVERSIDADE DE

L

ISBOA

Faculdade de Ciˆencias

Departamento de Inform´atica

EXPERIMENTS IN EVOLUTIONARY COLLECTIVE

ROBOTICS

Andr´e Gonz´alez Amor de Bastos

DISSERTAC

¸ ˜

AO

MESTRADO EM ENGENHARIA INFORM ´

ATICA

Especialização em Sistemas de Informação

2011

Agradecimentos

Resumo

Abstract

Contents

List of Figures

Chapter 1

Introduction

1.1

Motivation

1.2

Objectives

1.3

Contributions

1.4

Structure of the document

Chapter 2

Related work

2.1

Historic overview

2.2

Evolutionary collective robotics overview

Chapter 3

The simulator

3.1

Simulation environment

3.2

Components of the simulator

3.2.1

The robots

3.2.2

The controllers

3.2.3

The evolutionary algorithm

3.2.4

Evaluation functions

3.2.5

The environments

3.2.6

Physical objects

3.2.7

The experiments

3.3

Contributions to the simulator

Chapter 4

Self-organized aggregation

4.1

Introduction

4.1.1

Experimental setup

4.1.2