• Nenhum resultado encontrado

While reviewing other works related to traffic light management using RL, multiple techniques are available to improve the results. This section suggests some of them and describes how they could impact the results.

6.2.1 State Representation

As seen in Chapter3, different authors adopt different state representations in their methods. It is not straightforward to understand which features are more important to be represented.

45

The current traffic light phase is one feature they all agreed to use that is out of our state’s representation. Our reward function is even influenced by whether or not the phase changes be- tween steps. However, our agents have no way of knowing if they are keeping the same phase or not because the environment does not provide that information to the agents. Including this information is simple to do and can improve the results.

One feature that we did include is an image representation of the intersection. Although images bring many features with low programming effort, most of the pixels of those images are irrelevant.

After a rough analysis of Figure4.2(p.24), we can see that around 60% of the image represents the road surroundings, which is irrelevant to control the traffic light. Removing this information from the images is complex, but it can improve the agents’ training performance.

6.2.2 Flipping and Rotating Traffic

Zheng et al. [33] conducted a traffic light control experiment exploring the rotation and flipping of traffic in intersections with four roads. Looking at Figure6.1, it is possible to understand that when we rotate an intersection 90, 180, or 270 degrees, we get a similar situation. It is also possible to understand that for each of those four situations if we flip the east road with the west road, we once again get four more similar situations. In the open world (where humans are the vehicles’ drivers), these flipping situations are typical when most people go from home to work in the morning (e.g.

east to west) and from work to home in the evening (e.g. west to east).

The utility of flipping and rotating traffic situations is that from one single observation of the environment is possible to generate eight different (but similar) states that our RL agents must be able to handle. This technique can make our agents perform better in scenarios they have never faced.

Figure 6.1: Possible variations based on flipping and rotation of the top-left case, from [33].

6.2 Future Work 47

6.2.3 Multi-agent Scenarios

Until now, we have only explored scenarios with a single intersection. From real-world scenarios, we know that when we have multiple traffic lights in a more or less complex road network, the state of one traffic light can influence the traffic of the surrounding ones. In these scenarios, the goal is not to optimise the traffic flow of each intersection individually but to optimise the traffic flow of the entire road network. Multi-Agent Reinforcement Learning (MARL) is a field of RL where the actions of one agent influence the rewards received by all agents [8].

A step forward in improving this work is exploring MARL methods in controlling multiple AGV traffic intersections. However, these methods come with new challenges that Nguyen et al.

[15] explored. The five main challenges are partial observability of each agent, non-stationarity of the environment, dealing with continuous action spaces, multi-agent training schemes so that all agents train together, and transfer learning in multi-agent systems.

Some authors have already explored the use of MARL approaches to control multiple traffic lights. One of the early approaches by Pol et al. [25] uses multi-agent DQN with transfer learn- ing, but the training is not always stable. PressLight, from Wei et al. [31], is another work that introduces the concept ofpressureof intersections to coordinate multiple DQN agents. One more recent work by Chen et al., MPLight [3], improves the work done by PressLight by using agents with the FRAP architecture explored by Zheng et al. [33]. According to the author, FRAP is an architecture that “is invariant to symmetric operations like Flipping and Rotation and considers All Phase configurations”.

Using multi-agent methods allows us to complement the work already developed with the capacity to handle scenarios with a higher degree of complexity.

[1] Joshua Achiam. Spinning up in deep reinforcement learning. GitHub repository, 2018.

[2] Alvaro Cabrejas-Egea, Raymond Zhang, and Neil Walton. Reinforcement learning for traffic signal control: Comparison with commercial systems. Transportation Research Procedia, 58:638–645, 2021.

[3] Hua Wei Chacha Chen, Nan Xu, Guanjie Zheng, Ming Yang, Yuanhao Xiong, Kai Xu, and Zhenhui Li. Toward a thousand lights: Decentralized deep reinforcement learning for large- scale traffic signal control. InProceeding of the Thirty-fourth AAAI Conference on Artificial Intelligence (AAAI’20). New York, NY, 2020.

[4] Jim X. Chen. The evolution of computing: Alphago. Computing in Science Engineering, 18(4):4–7, 2016.

[5] Seung-Bae Cools, Carlos Gershenson, and Bart D’Hooghe. Self-organizing traffic lights: A realistic simulation. InAdvances in Applied Self-organizing Systems, pages 41–50. Springer London, nov 2007.

[6] M. De Ryck, M. Versteyhe, and F. Debrouwere. Automated guided vehicle systems, state-of- the-art control algorithms and techniques. Journal of Manufacturing Systems, 54:152–173, 2020.

[7] Zihan Ding, Yanhua Huang, Hang Yuan, and Hao Dong. Introduction to Reinforcement Learning, pages 47–123. Springer Singapore, Singapore, 2020.

[8] Pablo Hernandez-Leal, Bilal Kartal, and Matthew E Taylor. A survey and critique of multia- gent deep reinforcement learning. Autonomous Agents and Multi-Agent Systems, 33(6):750–

797, 2019.

[9] Beakcheol Jang, Myeonghwi Kim, Gaspard Harerimana, and Jong Wook Kim. Q-learning al- gorithms: A comprehensive classification and applications.IEEE Access, 7:133653–133667, 2019.

[10] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature, 521(7553):436–

444, 2015.

[11] M.L. Littman. Markov decision processes. In Neil J. Smelser and Paul B. Baltes, editors, International Encyclopedia of the Social Behavioral Sciences, pages 9240–9242. Pergamon, Oxford, 2001.

[12] Pablo Alvarez Lopez, Michael Behrisch, Laura Bieker-Walz, Jakob Erdmann, Yun-Pang Flötteröd, Robert Hilbrich, Leonhard Lücken, Johannes Rummel, Peter Wagner, and Eva- marie Wießner. Microscopic traffic simulation using sumo. InThe 21st IEEE International Conference on Intelligent Transportation Systems. IEEE, 2018.

48

REFERENCES 49

[13] Alan J. Miller. Settings for fixed-cycle traffic signals. Journal of the Operational Research Society, 14(4):373–386, 1963.

[14] Alan J. Miller. Settings for fixed-cycle traffic signals. Journal of the Operational Research Society, 14:373–386, 1963.

[15] Thanh Thi Nguyen, Ngoc Duy Nguyen, and Saeid Nahavandi. Deep reinforcement learning for multiagent systems: A review of challenges, solutions, and applications. IEEE Transac- tions on Cybernetics, 50(9):3826–3839, 2020.

[16] John Nickolls and William J Dally. The gpu computing era.IEEE micro, 30(2):56–69, 2010.

[17] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Z. Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative style, high-performance deep learning library. CoRR, abs/1912.01703, 2019.

[18] Péter Pálos and Árpád Huszák. Comparison of q-learning based traffic light control methods and objective functions. In2020 International Conference on Software, Telecommunications and Computer Networks (SoftCOM), pages 1–6, 2020.

[19] Mohit Sewak. Deep Q Network (DQN), Double DQN, and Dueling DQN, pages 95–108.

Springer Singapore, Singapore, 2019.

[20] Mohit Sewak. Policy-Based Reinforcement Learning Approaches, pages 127–140. Springer Singapore, Singapore, 2019.

[21] Stephen F. Smith, Gregory J. Barlow, Xiao-Feng Xie, and Zachary B. Rubinstein. Surtrac:

Scalable urban traffic control. 2013.

[22] Daniel W Stroock. An introduction to Markov processes, volume 230. Springer Science &

Business Media, 2013.

[23] Richard S Sutton and Andrew G Barto.Reinforcement learning: An introduction. MIT press, 2018.

[24] Elise van der Pol and Frans A. Oliehoek. Coordinated deep reinforcement learners for traffic light control. 2016.

[25] Elise Van der Pol and Frans A Oliehoek. Coordinated deep reinforcement learners for traffic light control. Proceedings of learning, inference and control of multi-agent systems (at NIPS 2016), 1, 2016.

[26] Cristina Vilarinho, José Pedro Tavares, and Rosaldo J. F. Rossetti. Intelligent traffic lights:

Green time period negotiaton. Transportation Research Procedia, (22):325–334, 2017.

[27] R. A. Vincent and J. R. Peirce. ’mova’: Traffic responsive, self-optimising signal control for isolated intersections. TRRL RESEARCH REPORT, 1988.

[28] Iris FA Vis. Survey of research in the design and control of automated guided vehicle sys- tems. European Journal of Operational Research, 170(3):677–709, 2006.

[29] Xuesong Wang, Haopeng Xiang, Yuhu Cheng, and Qiang Yu. Prioritised experience replay based on sample optimisation. The Journal of Engineering, 2020(13):298–302, 2020.

Documentos relacionados