TRABALHOS FUTUROS - Algoritmo TD(λ) implementado

Algoritmo 7 Algoritmo TD(λ) implementado

5.2 TRABALHOS FUTUROS

Durante o desenvolvimento e implementação do sistema de controle dos robôs humanoides (robô Newton e robô B1) e do algoritmo (algoritmo 7) proposto neste trabalho, houve algumas discussões sobre trabalhos que poderão serem realizados, portanto, nesta seção, serão brevemente relatados alguns desses possíveis trabalhos futuros.

A princípio, um dos trabalhos futuros é modelar as características construtivas dos robôs humanoides do Centro Universitário da FEI (robô Newton e robô B1) no simulador Webots, após modelado, deve-se executar o algoritmo (algoritmo 7) para buscar os novos valores dos parâmetros do andar, e transferir (transferência da política aprendida) esses valores de parâme- tros aprendidos para os robôs reais (robô Newton e robô B1). Com isso, poderá ser analisado o comportamento do agente no mundo real a partir de um conhecimento adquirido de um ambiente simulado (transferência de conhecimento entre o ambiente simulado e o real).

Outro trabalho que poderá ser realizado é a otimização dos parâmetros do andar utili- zando o algoritmo CMA-ES (Covariance Matrix Adaptation Evolution Strategy) (HANSEN, 2009). CMA-ES usa uma abordagem de geração de populações semelhante a um algoritmo ge- nético, e está sendo utilizados em alguns trabalhos, como em MacAlpine et al. (2012) e Farchy et al. (2013), que utilizam o CMA-ES para otimizar os parâmetros do andar do robô NAO.

O uso de heurística, como proposto por Bianchi (2004), também pode ser realizado em trabalhos futuros. No entanto, determinar a heurística para esse problema não é trivial, visto que os valores de parâmetros ótimos são desconhecidos. Porém, com o uso da heurística, será possível diminuir o tempo de aprendizado.

Uma forma de diminuir o tempo do treinamento é realizar o aprendizado de forma para- lela como é feito no trabalho do MacAlpine et al. (2012), onde os autores relatam que realizaram a simulação em um cluster, dessa forma, será possível testar o aprendizado com mais parâme- tros.

O aprendizado por reforço multiagente também poderá ser implementado para diminuir o tempo de aprendizado, tornando possível aumentar o número de parâmetros para ser apren- dido. No simulador Webots é possível incluir mais de um robô no ambiente de simulação, com um sistema deste tipo será possível ter os agentes compartilhando do conhecimento, para alcançar o objetivo.

REFERÊNCIAS

ALDEBARAN.robô humanóide NAO. 2015. Disponível em: <https://www.aldebaran.com/

en/humanoid-robot/nao-robot>. Acesso em: 27 fev. 2015.

ALI, M. et al. Closed-form inverse kinematic joint solution for humanoid robots. In: IEEE.

Intelligent Robots and Systems (IROS), 2010 IEEE/RSJ International Conference on.

[S.l.], 2010. p. 704–709.

ASIMO.site da Honda. 2015. Disponível em: <http://asimo.honda.com/>. Acesso em: 26 fev. 2015.

BIANCHI, R. A. d. C.Uso de heurísticas para a aceleração do aprendizado por reforço.

2004. Tese (Doutorado) — Universidade de São Paulo.

BRAFMAN, R. I.; TENNENHOLTZ, M. R-max-a general polynomial time algorithm for near-optimal reinforcement learning.The Journal of Machine Learning Research, JMLR.

org, v. 3, p. 213–231, 2003.

CELIBERTO, L. A.Aprendizado por Reforço Acelerado por Heurísticas no Domínio do Futebol de Robôs Simulado. 2007. Dissertação (Mestrado) — Centro Universitário da FEI.

COLLINS, S. et al. Efficient bipedal robots based on passive-dynamic walkers. Science, American Association for the Advancement of Science, v. 307, n. 5712, p. 1082–1085, 2005.

COPPELIA ROBOTICS. site do modelo dinâmico do V-REP. 2015. Disponível em:

<http://www.coppeliarobotics.com/helpFiles/en/dynamicsModule.htm>. Acesso em: 14 abr.

2015.

CORTEZ, M. P.; TONIDANDEL, F.Projeto Mecânico de um Robô Humanoide - Futebol de Robôs - Humanoid League. [S.l.], 2010. Projeto de Iniciação Cientifica.

CRAIG, J. J.Introduction to robotics: mechanics and control. [S.l.]: Pearson Prentice Hall Upper Saddle River, 2005.

DARWIN-OP. Código em C++ do DARwin-OP. 2015. Disponível em: <https:

//github.com/darwinop-ens/darwin-op>. Acesso em: 11 fev. 2015.

DONG, H. et al. Hardware design and gait generation of humanoid soccer robot stepper-3d.

Robotics and Autonomous Systems, Elsevier, v. 57, n. 8, p. 828–838, 2009.

DRC.DARPA Robotics Challenge. 2015. Disponível em: <http://www.theroboticschallenge.

org/overview>. Acesso em: 26 fev. 2015.

DYNAMIXEL, R.RX-28’s Manual. 2015. Disponível em: <http://support.robotis.com/en/

product/dynamixel/rx_series/rx-28.htm>. Acesso em: 10 fev. 2015.

FARCHY, A. et al. Humanoid robots learning to walk faster: From the real world to simulation and back. In: INTERNATIONAL FOUNDATION FOR AUTONOMOUS AGENTS

AND MULTIAGENT SYSTEMS.Proceedings of the 2013 international conference on autonomous agents and multi-agent systems. [S.l.], 2013. p. 39–46.

GARCIA, E.; TONIDANDEL, F.Eletrônica do Humanoide. [S.l.], 2011. Projeto de Iniciação Cientifica.

GAZEBO.Site do simulador Gazebo. 2015. Disponível em: <http://gazebosim.org/>. Acesso em: 14 abr. 2015.

GOSAVI, A. Reinforcement learning: A tutorial survey and recent advances.INFORMS Journal on Computing, INFORMS, v. 21, n. 2, p. 178–192, 2009.

HA, I.; TAMURA, Y.; ASAMA, H. Gait pattern generation and stabilization for humanoid robot based on coupled oscillators. In: IEEE.Intelligent Robots and Systems (IROS), 2011 IEEE/RSJ International Conference on. [S.l.], 2011. p. 3207–3212.

HA, I. et al. Development of Open Humanoid Platform DARwIn-OP. In:Proceedings of SICE Annual Conference (SICE). [S.l.]: IEEE, 2011. p. 2178–2181. ISBN 978-1-4577-0714-8.

ISSN pending.

HANSEN, N. Algoritmo CMA-ES. 2009. Disponível em: <https://www.lri.fr/~hansen/

cmatutorial.pdf>. Acesso em: 01 jul. 2015.

HESTER, T.; QUINLAN, M.; STONE, P. Generalized model learning for reinforcement learning on a humanoid robot. In: IEEE.Robotics and Automation (ICRA), 2010 IEEE International Conference on. [S.l.], 2010. p. 2369–2374.

HESTER, T.; STONE, P. Generalized model learning for reinforcement learning in factored domains. In: INTERNATIONAL FOUNDATION FOR AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS. Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems-Volume 2. [S.l.], 2009. p. 717–724.

HONDA.Manual de informações técnicas do ASIMO. 2015. Disponível em: <http:

//asimo.honda.com/downloads/pdf/asimo-technical-information.pdf>. Acesso em: 26 fev.

2015.

IHCM. Florida Institute For Human and Machine Cognition. 2015. Disponível em:

<http://www.ihmc.us/>. Acesso em: 08 jul. 2015.

INTEL. Repositório do código fonte do robô Jimmy. 2015. Disponível em: <https:

//github.com/21stCenturyRobot/HROS5-Framework>. Acesso em: 06 jul. 2015.

.Site da Intel sobre Robô Jimmy. 2015. Disponível em: <http://www.intel.com/content/

www/us/en/corporate-responsibility/better-future/21st-century-robot-program.html>. Acesso em: 01 mar. 2015.

. Site do Robô Jimmy. 2015. Disponível em: <http://www.21stcenturyrobot.com/>.

Acesso em: 01 mar. 2015.

INTEL NUC.Site da Intel - NUC. 2015. Disponível em: <http://www.intel.com.br/content/

www/br/pt/nuc/overview.html>. Acesso em: 10 fev. 2015.

KANEKO, K. et al. Humanoid robot hrp-4-humanoid robotics platform with lightweight and slim body. In: IEEE.Intelligent Robots and Systems (IROS), 2011 IEEE/RSJ International Conference on. [S.l.], 2011. p. 4400–4407.

KAWADA.Site da Kawada. 2015. Disponível em: <http://global.kawada.jp/mechatronics/

hrp4.html>. Acesso em: 01 jul. 2015.

KITANO, H.; ASADA, M. The robocup humanoid challenge as the millennium challenge for advanced robotics.Advanced Robotics, Taylor & Francis, v. 13, n. 8, p. 723–736, 1998.

KITANO, H. et al. Robocup: A challenge problem for ai.AI Magazine, v. 18, n. 1, p. 73–85, 1997.

MACALPINE, P. et al. Design and optimization of an omnidirectional humanoid walk: A winning approach at the robocup 2011 3d simulation competition. In:AAAI. [S.l.: s.n.], 2012.

MACKWORTH, A. K. On seeing robots. In: BASU, A.; LI, X. (Ed.).Computer Vision:

Systems, Theory and Applications. Singapore: World Scientific Press, 1993. p. 1–13.

Reprinted in P. Thagard (ed.), Mind Readings, MIT Press, 1998.

MARCHESE, S.; MUSCATO, G.; VIRK, G. Dynamically stable trajectory synthesis for a biped robot during the single-support phase. In: IEEE.Advanced Intelligent Mechatronics, 2001. Proceedings. 2001 IEEE/ASME International Conference on. [S.l.], 2001. v. 2, p.

953–958.

MARDER, E.; BUCHER, D. Central pattern generators and the control of rhythmic movements.Current biology, Elsevier, v. 11, n. 23, p. R986–R996, 2001.

MARTINS, M. F.Aprendizado por reforço acelerado por heurísticas aplicado ao domínio do futebol de robôs. 2007. Dissertação (Mestrado) — Centro Universitário da FEI.

MCGEER, T. Passive dynamic walking.The international journal of robotics research, Sage Publications, v. 9, n. 2, p. 62–82, 1990.

MITCHELL, T. M.Machine Learning. 1. ed. New York, NY, USA: McGraw-Hill, Inc., 1997.

ISBN 0070428077, 9780070428072.

NOTOLLINI, E. M.; BIANCHI, R. A. d. C.Implementação do Sistema Eletrônico de um Robô Humanoide. [S.l.], 2012. Projeto de Iniciação Cientifica.

ODE.Site da Open Dynamics Engine. 2015. Disponível em: <http://www.ode.org/>. Acesso em: 21 fev. 2015.

OGURA, Y. et al. Development of a new humanoid robot wabian-2. In: IEEE.Robotics and Automation, 2006. ICRA 2006. Proceedings 2006 IEEE International Conference on.

[S.l.], 2006. p. 76–81.

. Human-like walking with knee stretched, heel-contact and toe-off motion by a humanoid robot. In: IEEE.Intelligent Robots and Systems, 2006 IEEE/RSJ International Conference on. [S.l.], 2006. p. 3976–3981.

O’FLAHERTY, R. et al.Kinematics and Inverse Kinematics for the Humanoid Robot HUBO2+. [S.l.]: Georgia Institute of Technology, 2013.

PENG, J.; WILLIAMS, R. J. Incremental multi-step q-learning.Machine Learning, Springer, v. 22, n. 1-3, p. 283–290, 1996.

PERICO, D. H.Uso de Heurísticas obtidas por meio de Demonstrações para Aceleração do Aprendizado por Reforço. 2012. Dissertação (Mestrado) — Centro Universitário da FEI.

PERICO, D. H. et al. Hardware and software aspects of the design and assembly of a new humanoid robot for robocup soccer. p. 73–78, 2014.

PULULU.UM6 Ultra-Miniature Orientation Sensor Datasheet. 2015. Disponível em:

<http://www.pololu.com/file/0J442/UM6\_datasheet.pdf>. Acesso em: 19 fev. 2015.

RIGHETTI, L.; IJSPEERT, A. J. Programmable central pattern generators: an application to biped locomotion control. In: IEEE.Robotics and Automation, 2006. ICRA 2006.

Proceedings 2006 IEEE International Conference on. [S.l.], 2006. p. 1585–1590.

ROBOCUP. Regras da Liga Humanoide. 2015. Disponível em: <https://www.

robocuphumanoid.org/materials/rules/>. Acesso em: 17 jun. 2015.

. Site Oficial da RoboCup. 2015. Disponível em: <http://www.robocup.org/>. Acesso em: 18 jun. 2015.

ROBOCUP LIGA HUMANOIDE.Site oficial da liga Humanoide da RoboCup. 2015.

Disponível em: <https://www.robocuphumanoid.org/>. Acesso em: 28 fev. 2015.

ROBOTIS.Robô humanóide DARwIn-OP. 2015. Disponível em: <http://www.robotis.com/

xe/darwin_en>. Acesso em: 27 fev. 2015.

RUMMERY, G. A.; NIRANJAN, M.On-line Q-learning using connectionist systems. [S.l.]:

University of Cambridge, Department of Engineering, 1994.

RUSSELL, S. J.; NORVIG, P.Artificial intelligence: a modern approach. 3. ed. Upper Saddle River, NJ: Prentice Hall, 2010. ISBN 9780136042594 0136042597 9780132071482 0132071487.

SARDAIN, P.; BESSONNET, G. Forces acting on a biped robot. center of pressure-zero moment point.Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on, IEEE, v. 34, n. 5, p. 630–637, 2004.

SILVA, I. J. da.Repositório do AlgoritmoTD. 2015. Disponível em: <https://github.com/

Isaac25silva/TD-lambida--walk-Webots>. Acesso em: 06 jul. 2015.

SIMSPARK.Simulador SimSpark. 2015. Disponível em: <http://simspark.sourceforge.net/>.

Acesso em: 01 jul. 2015.

SUTHERLAND, D.; KAUFMAN, K.; MOITOZA, J. Kinematics of normal human walking.

Human walking, Williams and Wilkins, Baltimore, MD, p. 23–44, 1994.

SUTTON, R. S. Learning to predict by the methods of temporal differences. Machine learning, Springer, v. 3, n. 1, p. 9–44, 1988.

. Generalization in reinforcement learning: Successful examples using sparse coarse coding.Advances in neural information processing systems, Citeseer, p. 1038–1044, 1996.

SUTTON, R. S.; BARTO, A. G. Introduction to Reinforcement Learning. 1st. ed.

Cambridge, MA, USA: MIT Press, 1998. ISBN 0262193981.

V-REP.site do V-REP. 2015. Disponível em: <http://www.coppeliarobotics.com>. Acesso em: 14 abr. 2015.

VELOSO, M.; STONE, P. Video: Robocup robot soccer history 1997–2011. In: IEEE.

Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on.

[S.l.], 2012. p. 5452–5453.

VILãO, C. O. et al. A single camera vision system for a humanoid robot. In: IEEE.Robotics:

SBR-LARS Robotics Symposium and Robocontrol (SBR LARS Robocontrol), 2014 Joint Conference on. [S.l.], 2014. p. 181–186.

VUKOBRATOVI ´C, M.; BOROVAC, B. Zero-moment point—thirty five years of its life.

International Journal of Humanoid Robotics, World Scientific, v. 1, n. 01, p. 157–173, 2004.

VUKOBRATOVIC, M.; JURICIC, D. Contribution to the synthesis of biped gait.Biomedical Engineering, IEEE Transactions on, IEEE, n. 1, p. 1–6, 1969.

WATKINS, C. J. C. H. Learning from Delayed Rewards. 1989. Tese (Doutorado) — Cambridge University.

WATKINS, C. J. C. H.; DAYAN, P. Q-learning.Machine learning, Springer, v. 8, n. 3-4, p.

279–292, 1992.

WEBOTS.Site do Webots. 2015. Disponível em: <http://www.cyberbotics.com>. Acesso em:

21 fev. 2015.

. Usuarios do Webots. 2015. Disponível em: <http://www.cyberbotics.com/users>.

Acesso em: 21 fev. 2015.

WESTERVELT, E. R. et al.Feedback control of dynamic bipedal robot locomotion. Boca Raton: CRC Press, 2007. (Control and automation). ISBN 978-1-4200-5372-2. Disponível em:

<http://opac.inria.fr/record=b1134075>.

WFWOLVES. Vídeo de qualificação do WF Wolves. 2014. Disponível em: <https:

//www.youtube.com/watch?v=nge-DCk3kko&feature=youtu.be>. Acesso em: 19 jun. 2015.

WILLIAMS, R. L. Darwin-op humanoid robot kinematics. In: AMERICAN SOCIETY OF MECHANICAL ENGINEERS.ASME 2012 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. [S.l.], 2012. p.

1187–1196.

ZHAO, M. et al. Humanoid robot gait generation based on limit cycle stability. In:RoboCup 2008: Robot Soccer World Cup XII. [S.l.]: Springer, 2009. p. 403–413.

Os gráficos abaixo das figuras 59, 60 e 61 demonstra o comportamento dos servos motores durante o andar do robô. Esses gráficos foram capturados com os atributos da classe walkingX_M OV E_AM P LIT U DE = 20e A_M OV E_AM P LIT U DE = 0, no arquivo config.ini com period_time = 1500 para realizar um caminhar lento, geralmente o padrão usado éperiod_time= 600, abaixo de600o robô começa a andar de forma instável. A identi- ficação dos servos motores demonstrados nos gráficos, podem ser vistos na figura 58 que mostra a posição de cada servo motor no robô.

Figura 58 – Posições dos servos motores no robô.

Fonte: Manual do DARwIn-OP.

Figura 59 – Gráficos demonstrando o comportamento dos servos motores durante o caminhar.

Fonte: Autor.

Figura 60 – Gráficos demonstrando o comportamento dos servos motores durante o caminhar.

Fonte: Autor.

Figura 61 – Gráficos demonstrando o comportamento dos servos motores durante o caminhar.

Fonte: Autor.

Os gráficos da figura 62 apresentam três perturbações no eixoX do giroscópio, essas três perturbações foram geradas externamente por três pancadas frontais, que foram dados no robô para analisar o comportamento da realimentação da malha de controle, que busca tentar manter a estabilidade do robô quando esse tipo de situação ocorre.

Observando o gráfico do eixo X da IMU, pode-se observar que as três perturbações ocorreram nos intervalos de tempo de 10 a 12, 16 a 18, e 23 a 25 segundos. Nestes mesmos intervalos de tempo ocorreram a correção que evitou a queda do robô, isso pode ser observado nos gráficos do servo direito 15 e 16, onde ocorrem pequenas deformações no padrão cíclico da forma de onda, inclusive os servos 15 e 16 é que realizam o movimento de rotação no pitch do calcanhar (não foi mostrado gráfico do servo do joelho, porque a correção no joelho é menor que a do calcanhar).

Quando ocorrem pertubações na direção do eixo X do robô, as correções ocorrem na direçãoX, portanto a correção atua no pitch do calcanhar e no joelho proporcional a realimen- tação do giroscópio emX.

Quando ocorrem pertubações na direção do eixo Y do robô, as correções ocorrem na direção Y, portanto a correção atua no roll do calcanhar e no roll do quadril proporcional a realimentação do giroscópio emY.

No documento ORIGINAL ASSINADA (páginas 102-117)