We have launched a pilot evolution of a single population of reactive agents with the body of Robot1 in order to capacitate them to solve our navigation task in the double-T maze. These agents experienced a fixed initial position and orientation during evolution as here we were only interested if our stimuli were distinguishable, and our task could be solved. This evolution produced population RP1 with its best agent achieving a fitness score of 9.667 and the average fitness score of the final population being 9.210.
Figure 4.1: Progression of fitness across generations of population RP1. The stronger line represents the fitness score of the best agent in the selected agents pool while the lighter line represents the average fitness of this pool.
As presented in Figure 4.1, this population’s fitness increased rapidly at first but it soon reached a plateau around the five hundredth generation, showing that these agents were near their ceiling performance. However, once tested one hundred and twenty times in the same conditions it had evolved, the best RP1 agent’s average performance score was only 5.869. The RP1 agents evolved with no noise regarding either initial position or orientation. This decrease in fitness then reveals that these agents were very sensitive to the small noise generated by their own sensors. These tests on the best RP1 agent also reveal that it had considerably different scores for each of the four scenarios, confirmed by a Welch ANOVA test with F(119, 463996.872) = 9.57x1031, p < 0.001, eta2 = 0.877. Post hoc pairwise analysis through t-test assuming unequal variances revealed that the differences between scenario 1 and all others to be significant. Comparison between scenario 1 and 2 were made as t (29) = 9.103, d = 2.518245, CI 95% [4.64;6.63], between scenario 1 and 3 as t(29) = 17.546, d = 0.604, CI 95% [7,56;9,56], between scenario 1 and 4 as t(29) = 17.618, d = 0,604, CI 95% [7.593;8.588] and all p-values were inferior to 0.001. The difference between scenario 2 and 3 was significant with t(29) = 149.613, p < 0.001, d = 39.290 , CI 95% [2.890;2.970], and between scenario 2 and 4 was also significant with t(29) = 580.757, p < 0.001, d = 152.517, CI 95% [2.947;2.968]. The difference between scenario 3 and scenario 4 was not significant with t(29) = 1.399, p = 0.171, d = 0.368, CI 95% [-0.012;0.069]. When observing this agent’s behaviour in simulation it then becomes apparent that its strategy is to always attempt to solve the double-T maze for the bottom-left terminal, the goal of scenario 1. Upon the presentation of the critical stimulus (the coloured patch of floor at the initial position of the agent), what this agent does is to immediately approach the maze’s wall to its left and then simply follow it until the end of the trial.
This grants this agent a good performance for scenario 1, achieving, on average, a performance score of 11.557, which implies traveling part of the way back to the starting position. With this strategy this agent also achieves a mediocre performance for scenario 2, with the objective of finding the top-left terminal.
Its performance is the worst for scenarios 3 and 4, with their goals located on the right side of the maze.
Figure 4.2: Average performance scores of the best RP1 agent across the four different scenarios in conditions identical to those it had evolved in. Each scenario is colour coded according to the colours of the terminals of the maze, as shown in Figure 3.1 so that Scenario 1 corresponds to the scenario where a blue stimulus is presented and the agent has to travel to the bottom-left terminal; Scenario 2 corresponds to the scenario where a red stimulus is presented and the agent has to travel to the top-left terminal; Scenario 3 corresponds to the scenario where a yellow stimulus is presented and the agent has to travel to the top-right terminal; and Scenario 4 corresponds to the scenario where a purple stimulus is presented and the agent has to travel to the bottom-right terminal.
After testing this agent four times for each angle of initial orientation in the interval [-15º;15º]
it was visible, despite having evolved with an initial orientation fixed at 0º, this agent performed considerably well when this noise was introduced. Across the different test conditions (no-pen, increased-noise and no-pen+increased-noise), the best RP1 agent did not appear to have an altered behaviour, behaving the same way independent of the manipulation it was subject to. This was confirmed through a one-way ANOVA as F(3,476) = 0.092, p = 0.964, eta2 = 0.0005.
Figure 4.3: Average performance scores of the best RP1 agent for each initial orientation in the interval [-15;15], across the four different scenarios. Each scenario is colour coded according to the colours of the terminals of the maze, as shown in Figure 3.1. The dotted grey line labelled ‘threshold’ marks the score of nine points, obtained when the agent reaches the correct terminal.
Figure 4.4: Average performance scores of the best RP1 agent across the different test conditions.
Despite the RP1 evolution’s final solution not showing a discriminative capacity of the different stimuli, agents that had distinct behaviours for each stimulus, but a limited performance, were observed at earlier stages. We were thus informed that our stimuli were capable of generating different responses in our agents and could then proceed with our other evolutions.
We then launched an evolution of one population for each type of agent for Robot1. These populations were R1, of reactive agents; C1, of cognitive agents without pen; and Cpen1, of cognitive
agents with pen. These populations did experience noise regarding initial position and orientation, and we aimed to produce robust agents capable of solving the navigation task. After the programmed four thousand generations each population generated its own distinct solution. The best reactive agent, of population R1, attained a fitness score of 8.594 and the average of the population was 7.778. The best cognitive agent without pen, belonging to population C1, had a fitness of 6.739 and the population average was 6.430. The best cognitive agent with pen, of population Cpen1, reached a fitness of 7.726 and the average fitness score of this population was 8.845. A one-way ANOVA confirmed that there were differences amongst these populations with F(2,57) = 576.389, p < 0.001, eta2 = 0.953. Post hoc pairwise analysis through t-tests assuming equal variances revealed all populations were significantly different from one another with the comparison between populations R1 and C1 yielding t(38) = 22.219, p < 0.001, d = 7.026, CI 95% [1.349;1.349], between R1 and Cpen1 t(38) = -12.927, p < 0.001, d = 4.088, CI 95% [-1.66;-1.66] and C1 and Cpen1 t(38) = -35.005, p < 0.001, d = 11.070, CI 95% [-2.416;
-2.416].
Figure 4.5: Average fitness of the three evolved populations of Robot1.
Focusing on the reactive solutions, population R1 showed a similar increase of fitness across generations as population RP1: it increased suddenly at first and a plateau was soon reached. This suggests that, for our environment, our reactive agents have a limited performance they can achieve, becoming stuck in this solution and not progressing further. It is curious to note that having evolved for more generations, the average fitness of this population was not able to approximate the fitness of the best agent as it happened with population RP1.
Figure 4.6: Progression of fitness across generations of population R1. The stronger line represents the fitness score of the best agent in the selected agents pool while the lighter line represents the average fitness of this pool.
The average fitness of the best R1 agent after being tested in the same conditions it evolved showed a decrease from 8.589 to 6.072. This had already happened with the best RP1 agent. The strategy employed by the best R1 agent was also very similar to the one of RP1: this agent always attempted to solve the maze for the bottom-left terminal by following the wall to its left. Thus, it showed a similar pattern of performance across scenarios. A Welch ANOVA confirmed there were differences across scenarios with F(119, 55216.77) = 2.49x1015, p < 0.001, eta2 = 2.830.Post hoc pairwise analysis through t-test assuming unequal variances revealed that the differences between scenario 1 and all others to be significant. Comparison between scenario 1 and 2 were made as t (29) = 11.622, d = 3.001, CI 95%
[6.92;9.82], between scenario 1 and 3 as t(29) = 16.465, d = 4.251, CI 95% [9.10;11.69], between scenario 1 and 4 as t(29) = 16.466, d = 4.252, CI 95% [9.105;11.687] and all p-values were inferior to 0.001. The difference between scenario 2 and 3 was significant with t(29) = -5.852, p < 0.001, d = 1.511 , CI 95% [1.318;2.735], and between scenario 2 and 4 was also significant with t(29) = 2.854, p < 0.001, d = 1.512, CI 95% [1.32;2.74]. The difference between scenario 3 and scenario 4 was not significant with t(29) = 0.140, p = 0.889, d = 0.036, CI 95% [0.014;0.036]. The similar pattern is also apparent when comparing Figure 4.2 with Figure 4.7.
Figure 4.7: Average performance scores of the best R1 agent across the four different scenarios in conditions identical to those it had evolved in. Each scenario is colour coded according to the colours of the terminals of the maze, as shown in Figure 3.1.
Looking at the results from testing the best R1 agent for all possible initial orientations it could experience in its evolutionary setting revealed that, although some orientations seem to facilitate the agent in reaching the determined goal, its performance was impervious to initial orientation. This reveals this agent was more sensitive to noise generated by its sensors than to the noise introduced in its environment. Supporting this idea, one way ANOVA revealed there were no significant differences for the performance of this agent across the different test conditions with F(5,714) = 0.290, p = 0.918, eta2
= 0.002.
0 5 10 15 20
Score
R1 performace across scenarios
Scenario 1 Scenario 2 Scenario 3 Scenario 4
Figure 4.8: Performance scores of the best R1 robot across the four different scenarios for each initial orientation it could experience in its evolutionary conditions. Each scenario is colour coded according to the colours of the terminals of the maze, as shown in Figure 3.1. The dotted grey line labelled ‘threshold’ marks the score of nine points, obtained when the agent reaches the correct terminal.
Figure 4.9: Average performance scores of the best R1 agent across the different test conditions.
Regarding our cognitive agents without pen, the evolution of population C1 assumed a different pattern from RP1 and R1. It still had an exponential phase in the beginning, however, instead of reaching a still plateau, fitness increase merely slowed down but kept an upwards tendency until the end of the evolution. Yet, the best C1 agent only attained a fitness of 6.739. Once tested one hundred and twenty times in the same conditions it had evolved in, this agent’s average fitness was 4.494, again revealing fragility of the agents’ behaviour.
Figure 4.10: Progression of fitness across generations of population C1. The stronger line represents the fitness score of the best agent in the selected agents pool while the lighter line represents the average fitness of this pool.
A one-way ANOVA, F (3,116) = 1.446, p = 0.233, eta2 = 0.037, revealed that this agent had a similar performance for all four scenarios. While observing this agent in simulation, its behaviour appeared limited, consisting only of accelerating forward and relying on the random fluctuations of the initial orientation to sometimes attain a higher score.
Figure 4.11: Average performance scores of the best C1 agent across the four different scenarios in conditions identical to those it had evolved in. Each scenario is colour coded according to the colours of the terminals of the maze, as shown in Figure 3.1.
After testing the best C1 agent for all possible initial orientations it could experience in its evolutionary setting, its dependence on initial orientation became clearer. Although the favourable angles for which the agent could attain a higher fitness were scattered, a pattern could be observed.
Positive angles of initial orientation, which mean the agent will start its trial facing the wall to its right,
seem exploitable by the agent to approximate the terminals on the left of the maze. Negative angles of initial position seem to be exploitable to approximate the terminals on the right. A one way ANOVA comparing the best C1 agent across the three test conditions it was subject to (control, no-noise and increased-noise) revealed differences in its performance with F(2, 357) = 18.234, p < 0.001, eta2 = 0.102. Post hoc Dunnett’s test revealed that the difference between the evolution condition or control condition, and no-noise condition was significant. This further highlights that this agent is dependent on the fluctuations of initial orientation to attain higher fitness.
Figure 4.12: Performance scores of the best C1 robot across the four different scenarios for each initial orientation it could experience in its evolutionary conditions. Each scenario is colour coded according to the colours of the terminals of the maze, as shown in Figure 3.1. The dotted grey line labelled ‘threshold’ marks the score of nine points, obtained when the agent reaches the correct terminal.
Figure 4.13: Average performance scores of the best C1 agent across the different test conditions.
Taking a look at the cognitive agents with pen, it was observed that the evolution of population Cpen1 had a pattern similar to the evolution of C1. It also began with an exponential growth that soon slowed down but kept an upwards tendency. During evolution the best Cpen1 agent reached a fitness of 9.726 but when tested afterwards its average fitness was only 5.148, once again revealing the fragility of the agents’ behaviour.
Figure 4.14: Progression of fitness across generations of population Cpen1. The stronger line represents the fitness score of the best agent in the selected agents pool while the lighter line represents the average fitness of this pool.
A Welch ANOVA comparing the best Cpen1 agent’s performance across the four scenarios revealed that this agent had different performance for different scenarios as F(119, 51838.95) = 156768.9, p < 0.001, eta2 = 0.195. Post hoc pairwise analysis through t-test assuming unequal variances revealed the significant differences were between scenario 1 and scenario 3, with t(32) = -3.208, p = 0.003, d = 0.828, CI 95% [-5.052;-1.128]; between scenarios 1 and 4, with t(58) = -5.478, p = < 0.001, d = 1.414, CI 95% [-4.110;-4.110]; and between scenarios 2 and 4, with t(57) = 3.148, p = 0.002, d = 0.813, CI 95% [1,101;4.952]. All other comparisons did not yield statistical significance.
Figure 4.15: Average performance scores of the best Cpen1 agent across the four different scenarios in conditions identical to those it had evolved in. Each scenario is colour coded according to the colours of the terminals of the maze, as shown in Figure 3.1.
Observing this agent behave in simulation was possible to note that it had greater ability to find the terminals on the right side of the maze (scenarios 3 and 4). Looking at its performance when tested for all possible initial orientation during evolutions no pattern was recognisable, although some initial orientations would clearly yield better scores. This agent also seemed to be resistant to the manipulations it was subject to. A one way ANOVA revealed there were differences across the different conditions in which it was tested with F(5,714) = 2.376, p = 0.03, eta2 = 0.016. However, post hoc Dunnett’s test did not consider the difference between neither test condition and the control condition to be significant.
Figure 4.16: Performance scores of the best Cpen1 robot across the four different scenarios for each initial orientation it could experience in its evolutionary conditions. Each scenario is colour coded according to the colours of the terminals of the maze, as shown in Figure 3.1. The dotted grey line labelled ‘threshold’ marks the score of nine points, obtained when the agent reaches the correct terminal.
Figure 4.17: Average performance scores of the best Cpen1 agent across the different test conditions.
In the presence of this data, we felt the need to evolve agents utilizing a different robot, one with more sensors that could potentially better solve the maze. This was when we came up with Robot2.
We’ve also reduced the noise regarding initial orientation during evolutions of this robot as this could make the task easier and facilitate the evolution of optimal solutions. We launched three evolutions but only one evolution of cognitive agents with pen was able to run until completion. Like the other populations of cognitive agents (C1 and Cpen1), the evolution of Cpen2 started with an exponential growth of fitness that slowed down but remained increasing until the end of the evolution. During evolution, the best Cpen2 agent obtained a fitness of 8.205 and the average of the population was 7.716.
After being tested in control conditions, the best Cpen2 agent scored, on average, 4.702. When comparing this agent with the best Cpen1 (the other best cognitive agent with pen) in control conditions, no statistical significance was found through a t-test assuming unequal variances.
Figure 4.18: Progression of fitness across generations of population Cpen1. The stronger line represents the fitness score of the best agent in the selected agents pool while the lighter line represents the average fitness of this pool.
Figure 4.19: Average fitness of the two evolved populations of cognitive agents with pen.
A Welch ANOVA revealed this agent behaved differently for at least one scenario, with F(119, 57019.16) = 1407853, p < 0.001, eta2 = 0.323. Post hoc pairwise analysis revealed the significant differences were between scenario 1 and scenario 3, with t(34) = -4.199, p < 0.001, d = 1.084, CI 95%
[-3.300;-1.148]; between scenarios 1 and 4, with t(31) = -5.363, p < 0.001, d = 1.385, CI 95% [-5.597;-2.513]; between scenarios 2 and 3, with t(35) = -4.343, p < 0.001, d = 1.121, CI 95% [-3.396, -1.121]
and between scenarios 2 and 4, with t(32) = -5.466, p < 0.001, d = 1.411, CI 95% [-5.690; -2.600].
Figure 4.20: Average performance scores of the best Cpen2 agent across the four different scenarios in conditions identical to those it had evolved in. Each scenario is colour coded according to the colours of the terminals of the maze, as shown in Figure 3.1.
It is hard to find patterns to examine the best Cpen2 agent’s behaviour and analyse its performance across the initial orientation possible during evolution. It would seem that the top-right terminal is more easily found when initial orientations make the robot face the left wall. However, finding the bottom-right terminal seems to be facilitated by many orientations randomly scattered between the [-30º;30º] range and terminals on the left side of the maze are never found. A Welch ANOVA comparing the agent’s performance across the different test conditions yielded F(479,3844679)
= 5044078, p < 0.001, eta2 = 0,032, revealing there were differences across manipulations. Post hoc Dunnett’s test revealed the no-pen+no-noise condition to be significantly different from the control.
Figure 4.21: Performance scores of the best Cpen2 robot across the four different scenarios for each initial orientation it could experience in its evolutionary conditions. Each scenario is colour coded according to the colours of the terminals of the maze, as shown in Figure 3.1. The dotted grey line labelled ‘threshold’ marks the score of nine points, obtained when the agent reaches the correct terminal.
Figure 4.22: Average performance scores of the best Cpen2 agent across the different test conditions.