Generalization and anticipation skills for robot ball catching using supervised learning

(1)

Universidade de Aveiro Departamento deElectrónica, Telecomunica¸cões e Informática, 2017

Diogo

Carneiro

Generalization and Anticipation Skills for Robot

Ball Catching Using Supervised Learning

T´

ecnicas de Generaliza¸

c˜

ao e Antecipa¸

c˜

ao para um

Robˆ

o de Captura de Bolas Usando Aprendizagem

Supervisionada

(2)

(3)

Universidade de Aveiro Departamento deElectrónica, Telecomunica¸cões e Informática, 2017

Diogo

Carneiro

Generalization and Anticipation Skills on Robot

Ball Catching Using Supervised Learning

Disserta¸cão apresentada à Universidade de Aveiro para cumprimento dos requesitos necessários à obten¸cão do grau de Mestrado em Engenharia Eletrónica e Telecomunica¸cões, realizada sob a orienta¸cão cient´ıfica do Doutor Filipe Miguel Teixeira Pereira da Silva, Professor Auxiliar do De-partamento de Eletrónica, Telecomunica¸cões e Informática da Universidade de Aveiro, e da Doutora Pétia Georgieva, Professora Auxiliar do Departa-mento de Eletrónica, Telecomunica¸cões e Informática da Universidade de Aveiro

(4)

(5)

o j´uri / the jury

presidente / president Professor Doutor Pedro Nicolau Faria da Fonseca

Professor Auxiliar do Departamento de Eletrónica, Telecomunica¸cões e Informática da Universidade de Aveiro

vogais / examiners committee Professor Doutor Miguel Armando Riem de Oliveira

Professor Auxiliar do Departamento de Engenharia Mecnica da Universidade de Aveiro (Arguente Principal)

Professor Doutor Filipe Miguel Teixeira Pereira da Silva

Professor Auxiliar do Departamento de Eletrónica, Telecomunica¸cões e Informática da Universidade de Aveiro (Orientador)

(6)

(7)

agradecimentos Gostaria de agradecer ao Professor Doutor Filipe Silva e à Professora Doutora Pétia Georgieva por terem aceitado o desafio de me orientar nesta disserta¸cão. Obrigado pelo apoio, confiana e ajuda ao longo deste percurso. Agrade¸co aos meus familiares, em particular os meus pais, Valter e Umbe-lina, e à minha irmã, Catarina, por todo o apoio que me deram durante o meu percurso académico.

Gostava de agradecer tamb´em a todos os que me apoiaram de forma directa ou indirecta, sejam familiares, amigos ou professores

(8)

(9)

Palavras-Chave Interaçcão Robô-Humano, Captura de Bola, Aprendizagem Automática, Primitivas de Movimento, Redes Neuronais

Resumo Abordagens por aprendizagem são uma das formas mais interessantes para dotar robôs com melhores capacidades em termos de autonomia e adapta¸cão. Esta disserta¸cão aborda o problema da captura de uma bola por parte de um robô, focando-se no potencial das técnicas de aprendiza-gem supervisionada para lidar com as exigências impostas aos sistemas de percep¸cão e açcão. A primeira parte desta disserta¸cão tem por objetivo demonstrar que a previsão de inten¸cão através da observa¸cão de a¸cões hu-manas representa uma capacidade importante em robôs que realizam tarefas interativas. Este trabalho explora o papel de antecipa¸cões que provêm da observa¸cão do movimento de lan¸camento da bola para melhorar a capaci-dade de previsão do sistema robótico na intera¸cão com um parceiro humano. Para o efeito, uma rede neural do tipo ”feedforward” é treinada para es-timar a posi¸cão e velocidade inicial da bola em vôo, dada uma sequência de observa¸cões iniciais. O manipulador robótico adotado neste estudo, com 3 graus de liberdade, reage a atualiza¸cões no ponto e tempo de captura previstos usando um método baseado no Jacobiano para obter uma solu¸cão para a cinemática inversa. Várias simula¸cões demonstram que a abordagem proposta supera até 20% a metodologia clássica em que a gera¸cão de pre-visões depende unicamente de informa¸cões dispon´ıveis durante a fase de vôo da bola. Na segunda parte, esta disserta¸cão adota uma abordagem bio-inspirada para a gera¸cão de movimentos capazes de lidar com as exigências de adapta¸cão on-line do bra¸co robótico. Em particular, este estudo destina-se à formula¸cão matemática de primitivas discretas de movimento para reproduzir e generalizar uma única trajetória aprendida. O método é val-idado com um manipulador robótico com 6 graus de liberdade usando o simulador V-REP. As simula¸cões realizadas mostram que as primitivas de movimento são apropriadas para reproduzir e generalizar uma trajectória de demonstra¸cão desejada.

(10)

(11)

Keywords Human-Robot Interaction, Ball Catching, Machine Learning, Movement Primitives, Neural Networks

Abstract Learning approaches are one of the most interesting ways for endowing robots with advanced capabilities in terms of autonomy and adaptability. This dissertation addresses the problem of robot ball catching by focusing on the potential of supervised learning techniques to deal with the demands imposed to the perception and action systems. The first part of this dis-sertation aims to show that intention prediction from observation of human actions may be an essential skill for robots performing interactive tasks. This work explores the role of early anticipations to improve the predic-tion ability of a robotic system playing ball catch with a human partner. The source of anticipatory information comes from the observation of the thrower’s motion before the ball is released. For that purpose, a feed-forward neural network is trained to estimate the initial position and velocity of the ball in-flight given a sequence of motion observations during the throwing phase. A 3-degrees-of-freedom manipulator, adopted in this study, reacts to updates in the predicted catching point and time through a Jacobian-based scheme that provides an inverse kinematics solution. Several simulation results demonstrate that the proposed approach outperforms up to 20% the classical methodology in which the generation of predictions relies only in available information during the flight phase of the ball. In the second part, this dissertation adopts a bio-inspired movement generation approach to deal with requirements of online adaptation of the robot’s arm motion. Particularly, it is studied the mathematical formulation of discrete movement primitives in order to reproduce and generalize a single learned demonstra-tion. The method is validated with a robot arm of 6-degree-of-freedom on the V-REP simulation environment. The experiments conducted show that these movement primitives are adequate for reproducing and generalizing a desired trajectory.

(12)

(13)

List of Figures

2.1 Machine learning categories . . . 9

3.1 Real hand motion versus polynomial generation on X axis. . . 17

3.2 Real hand motion versus polynomial generation on Y axis. . . 18

3.3 Real hand motion versus polynomial generation on Z axis. . . 18

3.4 Information flow on a feedforward neural network . . . 19

3.5 Feed-forward neural network in Matlab with a single hidden layer of 5 neurons. 20 3.6 Study of the number of inputs for FNN. . . 21

3.7 Monte Carlo Cross-Validation - Data randomly split into training and valida-tion datasets, K=3 folds . . . 22

3.8 Single FNN - Model Validation. . . 23

3.9 Position FNN - Model Validation. . . 26

3.10 Velocity FNN - Model Validation. . . 26

3.11 Velocity X FNN - Model Validation. . . 28

3.12 Velocity Y FNN - Model Validation. . . 28

3.13 Velocity Z FNN - Model Validation. . . 28

3.14 Matlab simulation environment. . . 31

3.15 Reachable space of the 3-DOF spatial serial manipulator (link parameters: l1 = 0.3 m, l2 = 0.3 m and l3 = 0.3 m) in the posture adopted at the beginning of each trial. Blue points represent the reachable space given joint physical limits defined by the following inequalities: q1 < 105◦, 0 < q2 < 150◦ and q3 < 180◦. 31 4.1 Model Validation - Position FNN with thrower on a specific position. . . 34

4.2 Model Validation - Velocity FNN with thrower on a specific position. . . 34

4.3 Example of noise sample overlap. . . 36

4.4 Position variance on training and testing datasets. . . 37

4.5 Velocity variance on training and testing datasets. . . 37

4.6 Space coordinates of the ball hitting the floor (Z=0) for the different clusters. 40 4.7 Model Validation - Position FNN - Cluster 1. . . 40

4.8 Model Validation - Velocity FNN - Cluster 1. . . 40

4.9 Model Validation - Position FNN - Cluster 2. . . 41

4.11 Model Validation - Position FNN - Cluster 3. . . 42

4.13 Performance of the velocity FNN in function of the number of hand samples used for prediction. . . 44

(16)

4.14 Model Validation - Position FNN - Cluster 2 - 30 Hand Samples. . . 46

4.15 Model Validation - Velocity FNN - Cluster 2- 30 Hand Samples. . . 46

4.16 Model Validation - Position FNN - Cluster 3 - 30 Hand Samples. . . 47

4.17 Model Validation - Velocity FNN - Cluster 3 - 30 Hand Samples. . . 47

4.18 Variation of Signal to Noise Ratio. . . 48

4.19 Switching Prediction Sample / Starting Movement Sample. . . 49

4.20 Variation of maximum joint velocity of the robot arm. . . 51

4.21 Closer look at maximum joint velocities. . . 51

5.1 Minimum jerk trajectory example for training DMPs. . . 54

5.2 Internal state variables of the DMPs for the minimum jerk trajectory. . . 55

5.3 DMPs output for the minimum jerk trajectory. . . 57

5.4 DMPs acceleration variation on the minimum jerk trajectory with representa-tion of the attractor and forcing terms. . . 58

5.5 DMPs velocity variation on the minimum jerk trajectory with representation of the attractor and forcing terms. . . 58

5.6 DMPs position variation on the minimum jerk trajectory with representation of the attractor and forcing terms. . . 58

5.7 DMPs position output for different goals. . . 59

5.8 DMPs velocity output for different goals. . . 60

5.9 DMPs position output for different taus. . . 60

5.10 DMPs velocity output for different taus. . . 61

5.11 DMPs velocity output for different taus. . . 61

5.12 Simulation environment using V-REP. . . 64

5.13 Simulation of 200 throws using DMPs as arm control. . . 64 5.14 Example of the end-effector trajectory using the DMPs to catch a flying ball. 65

(17)

List of Tables

3.1 Single FNN - Test (position/velocity) error with the best model underlined . 24 3.2 Position and Velocity FNNs - Test error with the best models underlined . . 25 3.3 Velocity FNNs - Test error with the best models underlined . . . 27 3.4 Test Errors obtained for different number of FNNs on the anticipation method. 32 4.1 Position and Velocity FNNs - Test error with the best models underlined . . 35 4.2 Group Division of the Velocity Components . . . 38 4.3 Selected Cluster of the Velocity Components . . . 38 4.4 Success rate of the anticipation method and classical method on the different

clusters . . . 39 4.5 Position and Velocity FNNs - Test error with the best models underlined on

Cluster 1 . . . 39 4.6 Position and Velocity FNNs - Test error with the best models underlined on

Cluster 2 . . . 43 4.7 Position and Velocity FNNs - Test error with the best models underlined on

Cluster 3 . . . 43 4.8 Success Rate using 20 and 30 hand samples for prediction in anticipation

method with clustering . . . 44 4.9 Position and Velocity FNNs - Test error with the best models underlined on

Cluster 2 using 30 hand samples . . . 45 4.10 Position and Velocity FNNs - Test error with the best models underlined on

(18)

(19)

Chapter 1

Introduction

The interception of a moving object along its trajectory is a challenging task due to demanding spatial-temporal constraints, requiring interchange between visual, planning and control systems in order to get in the catching position at the right time [1]. Catching a flying ball involves a sequence of control actions that must be performed without errors, including moving the hand to the interception point, adjusting the hand posture and then closing the hand [2]. Furthermore, the generation of efficient and skilled actions to catch a flying ball, either in humans or robots, involves the search for solutions to several problems found in all control systems: delays, noise and uncertainty. Delays are present in all stages of a control system since the information about itself and the environment becomes rapidly outdated. Noise occurs in both sensors and actuators, limiting the ability of the control system in terms of perception and action. Furthermore, sensory noise contributes to uncertainties in ball’s trajectory prediction leading to inefficiencies of robot’s movement. The uncertainty either about the state of the world or of the task is also an important factor, creating additional difficulties in the control of the robot’s endpoint.

In this context, the control of the end-effector’s motion can be seen as the result of a trade-off between spatial and temporal accuracy, as well as a trade-off between accuracy due to sensory and motor noise. From another perspective, there are multiple possible solutions (redundancy) when intercepting a moving object along its trajectory. For example, a flying ball can be intercepted by the catcher’s hand at numerous spatial positions along its trajectory, even if it is restricted to a certain temporal window. Moreover, each spatial target can be reached by diverse joint trajectories and hand postures. Consequently, the redundancy associated to the task can be seen as an additional problem or, instead, an opportunity to be exploited.

This dissertation was proposed by the Institute of Electronics and Informatics Engineering of Aveiro (University of Aveiro) in the scope of current activities aiming to design and evaluate robotic systems for human-robot interaction and for advanced studies on robot learning. The main focus is the use of standard components integrated in a development environment supported by open-source software.

(20)

1.1 Motivation

Catching a flying ball involves the accurate prediction of the ball’s trajectory and the con-trol of the robot’s end-effector in order to intercept the flying ball. On the one hand, humans can accomplish this task effortless such that they tend to underestimate their own capabilities for extracting information from the environment, learning from multiple sources and predict-ing in both space and time. On the other hand, humanoid robots have sophisticated control architectures and computational power, but still lack advanced abilities in terms of autonomy and adaptability. Machine learning approaches have been used in several works with some encouraging results in this regard, though dealing with the demands of a complex real-world environment is not nearly so straightforward. This work aims to contribute into the problem of robot ball catching by addressing the potential of supervised learning techniques to deal with the challenges of prediction and motion planning.

The motivation behind this work is based on recent developments related to prediction of human intentions and bio-inspired approaches for motion generation. Although anticipating human intentions can be essential for robots performing interactive tasks, the subject has not received much attention in the specific context of robot ball catching. Understanding human actions and teaching robots to behave in a human-like way is another challenging task in robotics. This work follows a promising perspective for robot learning in which human demonstrations are represented by movement primitives that, once combined, may result on more complex behaviors. Another motivating factor is the awareness that a robot systems will be increasingly present in our daily lives, dominated by low-cost interactive manipulators and standard hardware components. In this line of thought, the incorporation of machine learning techniques into the design of appropriate solutions seems essential to take into account the specific capabilities and limitations of the used hardware.

1.2 Objectives

This dissertation aims to provide advances towards the development of a robotic system able to perform an interactive task with a human partner: catching a flying ball. The main objectives of this dissertation are twofold:

• The first and central objective is to study the role of early anticipations to improve the prediction ability of a robotic system playing ball catching with a human partner. The source of anticipatory information comes from the observation of the thrower’s motion before the ball is released. The key idea behind this research is that an effective solution to the ball-catching problem should involve the use of relevant sources of information as early as possible and not only the continuous refinement of the information extracted during the flying phase.

• The second objective is to explore the use of discrete movement primitives [3] for gener-ating robot’s movements based on human demonstrations of ball catching. The idea is to follow a promising research direction used to generalize and adapt human demonstra-tions by adjusting a few open control parameters of a learned model. This is referred to as the modular generation of movements that result from the combination of a set of basic primitives.

(21)

Although having in mind an application with a physical robot, the use of a virtual en-vironment provides a powerful way for simulating and training the real situation. At the current stage of development, computer simulations are adopted to allow the design, test and tune of different solutions in order to decide which one will be implemented on the real robotic system. For that purpose, it is adopted the Virtual Robot Experimentation Platform (V-REP) simulator and the Matlab programming environment.

1.3 Dissertation Outline

The dissertation is organized in six chapters as follows:

• Chapter 1 provides an introduction to this dissertation, including the motivation driving the proposed work and its objectives.

• Chapter 2 reviews related work and main concepts, providing a background knowledge on machine learning, dynamic movement primitives and V-REP simulator.

• Chapter 3 discusses the design of the feedforward neural network for providing an aprox-imate mapping between the sequential observation of the thrower’s hand and the initial conditions of the ball at the start of the ballistic phase.

• Chapter 4 describes the model archetype evolution, along with the impact of restrict-ing the trainrestrict-ing dataset in order to improve the model accuracy. A critical compari-son between the results obtained using early predictions and those using the classical methodology is also presented.

• Chapter 5 presents the concept of motion generalization using dynamic movement prim-itives, as well as its integration in the work with a general discussion and performance analysis.

• Chapter 6 draws the main conclusions for this dissertation and discusses some possible directions of future work.

(22)

(23)

Chapter 2

Background and Context

This chapter provides a review of the literature and the context of the dissertation work. Section 2.1.1 presents an overview of human studies with particular emphasis on two different approaches in which the control strategies are commonly divided and the role of anticipa-tion mechanisms for intercepting moving objects. From the viewpoint of robotics, numerous approaches for robot ball catching have been developed with their own solutions to the prob-lems of perception and action. Section 2.1.2 briefly discusses the most relevant works found in the literature, focusing on how they predict the trajectories of the moving object and how they generate the robot’s motions. Finally, Section 2.2 provides the context for the work and summarizes the proposed approach to achieve the objectives outlined in Chapter 1.

2.1 Literature Review

2.1.1 Human interception of moving objects

A subject that is relevant for this dissertation is the study of human ball-catching, namely to find evidences of the role of early anticipations and understanding motion control strategies used by humans for catching flying objects. Currently, there is a debate between predictive approaches,also referred to as model-based approaches or ballistic control, and prospective approaches, also referred to as online approaches or feedback control [4]. On the one hand, a predictive control strategy relies on the initial observations of the ball in order to estimate a ballistic model of its trajectory, allowing the prediction of the most probable catching point and time. The predictive strategy seems to be necessary to compensate the time required for the complete motion until the catching point, when the ball has a short amount of flight time. For example, this is a typical strategy in baseball and cricket games [5], [6]. On the other hand, prospective control is based on a feedback control law aiming to minimize the distance between the flying object and the catcher’s endpoint. This type of strategy seems useful when the ball’s trajectory is difficult to estimate from initial conditions, but there is enough time to perform adjustments before the contact point. For example, the outfield players in baseball games [5]. Katsumata & Russell [7] evaluated predictive and prospective control strategies, while interrupting the availability of visual information of a falling ball. The participants in the study were asked to hit a ball dropped from different heights under two conditions: full-vision and partial occlusion. The most important findings, by analyzing the initiation and duration of the movement, provide evidence for predictive control in initiating the arm’s swing

(24)

and prospective control in guiding the bat used to hit the ball. More recently, the studies of Zhao & Warren [4] and de la Malla & L´opez-Moliner [8] also investigated the combination of predictive and online visual information throughout the perception-action cycle.

Recent works have emphasized the importance of predicting human actions in many sports. For example, Loffing and Ca˜nal-Bruland [9] reported recent works and open issues, discussing the role of experience to anticipation and the variations in gaze behavior. In ball games, like handball and basketball, humans divide their attention between the ball and other aspects of the game such as the movement of the other players. Other studies have investigated the human gaze behavior during ball-catching [10], [11]. L´opez-Moliner and Brenner [11] studied the importance of getting information about the ball at particular moments when participants needed to perform a secondary task. Authors verified the existence of a great flexibility in using information, meaning that each participant decide when visual information would be useful for any particular task. Faisal and Wolpert [12] studied how subjects trade-off sensory and movement uncertainty for catching a falling ball. The task was formulated using a probabilistic model of the perception-action cycle to predict the optimal decision. Findings revealed an optimal switching time between the perception phase and the subject’s control of the catching task, according to the individual sensory and motor uncertainties. Another interesting finding was reported by Stone et al. [13] who manipulated participants’ access to earlier information regarding both the thrower’s action and the ball’s trajectory. The performance of each participant was evaluated considering three experimental conditions: only information about the thrower’s action is available, only information about the flying ball is available and both advanced visual information and ball in flight are available. By recording the whole body data, authors investigated the influence of these experimental conditions on the postural control. Findings revealed that movements were initiated earlier when advanced visual information was available, prior to ball flight, resulting in more controlled actions and improved catching performance.

In summary, it seems clear that, in humans, the interception of moving objects involve the progressive use of early relevant sources of information. Since the reaction required to catch an object takes time, it is presumably advantageous to make accurate predictions before the object is released and refine them as the object approaches the catcher. Furthermore, in sports, there is growing evidence that anticipation skills benefit from visual experience and task expertise [9].

(25)

2.1.2 Robot ball catching

From the viewpoint of robot ball-catching, numerous approaches have been developed with their own solutions to deal with the two main problems associated with this specific task. The first problem is the accurate estimation of the ball’s trajectory for predicting the catching point and time. The second problem is the online re-planning of the robot’s motion such that the end-effector can catch the ball in time. This literature review on robot ball catching is based on a wide range of sources, including some review references such as [14], [15], [16]. More recently, the works of Kim [17] and Kim et al. [18] provide a good overview of the most important developments by focusing on the approaches used to predict the trajectories of the flying object, to determine the catching posture (i.e., the interception point and the hand’s orientation) and to plan and control the robot’s motion. In line with this, the present subsection follows a chronological order starting with the pioneering work of Slotine and colleagues [19], [20] and ending with the remarkable works from Aude Billard’s lab [18], [21]. The work described by Hove and Slotine [19] and Hong and Slotine [20] used the 4-DOF WAM arm (Barret Technology) and an active vision system with output information at 60 Hz. This work considers that the catching point corresponds to the closest point of the ball’s trajectory to the base of the robot, while the end-effector assumes a perpendicular orientation with respect to the ball’s trajectory. The trajectory to catch the ball is planned in Cartesian-space using a 3rd order polynomial function, which requires an inverse kinematics algorithm running in the control loop. The best performance results were found to be 70-80% success for similar launches. Nishiwaki et al. [22] addresses both the falling ball task and the ball catching task by a humanoid robot Saika using an active vision system consisting of two-CCD cameras. In this work, the end-effector reaches the catching point through an inverse kinematics model using a three layered neural network. Frese et al. [23] used a 7 DOF DLR-LWR-II arm equipped with a basket and an off-the-shelf stereo vision system that acquires and processes images at 50 Hz (combination of two cameras).The catching point selection considers two criteria: first, the choice of location that is near to the robot’s end-effector. Second, a catching point far away from the robot to avoid physical constraints such as joint limits. The catching configuration is calculated to ensure a perpendicular orientation of the basket with respect to the ball’s trajectory. In their experiments, the robot succeeded in 2/3 of the attempts to catch the ball. Most of the failures occurred due to the system’s (camera and lens) limited horizontal field of view (FOV), resulting in the system seeing the ball too late. Riley and Atkeson [24] investigated the problem of ball catching by creating human-like behaviors for a humanoid robot. The robot arm, equipped with a baseball glove, is used to catch the ball, but without considering the end-effector’s orientation. A Quick MAG stereo color vision system is used to detect and track the ball at 60Hz. In this work, the catching point is derived from the intersection of the estimated ball’s trajectory with a horizontal plane placed at a certain height.

Most of the works described above use a parabolic model for the trajectory of the flying ball which is, subsequently, estimated through recursive least squares [20], [24], [25]. A few works consider also the incorporation of air drag with the ballistic mode [23]. The most common solution to obtain the desired catching point and time considers the closest point to the end-effector from the interception of the ball’s trajectory with the robot’s reachable space. From the viewpoint of the robot’s motion, many works use polynomials for satisfying constraints at the initial and final points (i.e., coundary conditions) in order to generate the desired trajectory [26] Learning demonstrations from a human teacher is another approach gaining

(26)

an increased interest for trajectory planning [24], [27], [15]. The motion of the gripper (inter-connected to the arm motion) is important to ensure a stable catch, but it is a problem not addressed in many of the above-mentioned works. The solution adopted in most recent works uses an unified hand-arm strategy in which the closure of the fingers is triggered when the end-effector achieves a given distance to the flying object [20], [28], [26], [24]. B¨auml et al. [16] addressed the joint control of a 7-DOF arm and a 12-DOF hand as a nonlinear optimization problem subject to nonlinear constraints. Later, B¨auml et al. [14] used the mobile humanoid robot Rollin Justin for catching up to two balls that are thrown simultaneously. All degrees-of-freedom are used to accomplish the task, including the arms, the torso and the mobile platform itself. At the same time, the 2-DOF pan-tilt unit allows to ensure the ball is in the field of view of the stereo vision system. More recently, Kim et al. [18] presented a learning framework to teach a robot to catch fast flying objects with uneven shapes through the observation of demonstrations encoded by dynamical systems (DS). Therefore, the robotic system learns both the objects’ dynamics and a model of the arm’s movements. Based on the same framework, Salehian et al. [21] proposed a strategy in order to get more time to close the fingers in which the robot’s hand follows the object’s trajectory for a short period of time. The control law is expressed as a linear parameter varying (LPV) system, whose parameters are approximated using Gaussian mixture models (GMMs).

2.1.3 Previous studies at IEETA

This dissertation was proposed in the context of current activities aiming to design and evaluate robotic systems for use by or with humans. An objective of the study was the use of standard components, to make it possible to replicate the setup with a minimum effort. Two previous studies [29] [30] provide an important contextualization of these earlier activities. The most recent and directly related [30] sought to develop a testbed for a ball catching task involving an upper-body humanoid robot and a human partner. The study focused on the development of the hardware and software infrastructures by employing off-the-shelf compo-nents, namely an educational/research manipulator Cyton Gamma 1500 [31] and a Kinect depth sensor. The development environment is supported by the Robot Operating System [32] framework under Linux, using C/C++ programming language. Several computational tools have been developed in previous works, including detection and tracking algorithms, estima-tion methods based on Kalman filtering, planning and simple point-to-point moestima-tion control of the dual-arm humanoid torso.

(27)

2.2 Computational Tools

2.2.1 Machine learning

Machine Learning (ML) is a subfield of artificial intelligence that is concerned with the design, analysis and implementation of programs that learn from examples or experience, without being explicitly programmed [33]. ML consists in making predictions through meth-ods and algorithms from the mathematical optimization field, sharing a close relationship with computational statistics. Data mining [34] [35] is a well know subfield of machine learn-ing that focus more on exploratory data analysis, capable of uncoverlearn-ing hidden patterns and establishing behavioral profiles for multiple purposes, including detection of anomalies, data clustering and finding relationships between variables. The uncovering of trends in data al-lows the production of accurate predictions, that can be then used to make reliable decisions by specialist in different fields, such as in marketing, financial services, health-care, telecom-munications and other fields. The prediction capabilities of ML algorithms make them very suitable approach to create models of complex and unknown systems.

In Figure 2.1 are illustrated the major machine learning approaches, depending on the available information (data) [36]:

• Supervised learning: The machine receives a dataset of inputs along with the desired outputs (labeled data). The goal is to discover the underlying rule that maps the inputs into the outputs.

• Unsupervised learning: The machine receives a dataset of inputs without the respective outputs (unlabeled data). The goal is to explore the data statistical nature and discover patterns and hidden data structure, and eventually cluster data into distinct groups. • Reinforcement learning: A computer program (agent) interacts with a dynamic

envi-ronment in order to perform a certain goal. The agent is provided with a feedback in terms of rewards and penalties as it explores the possible actions.

(28)

The semi-supervised learning is a mixture of supervised and unsupervised learning where data are partially labeled and unlabeled. Labeling is an expensive procedure, therefore unsu-pervised learning provided with some labeled examples can improve significantly the learning outcome.

Nowadays, ML algorithms are embedded in a wide variety of mainstream technologies where the following tasks are typically solved:

• Classification: The outputs are discrete values (representing different classes) and the inputs (the examples) belong to one of the classes. The goal is to train a model that correctly assigns new examples to their respective class. Recognition of Spam versus reg-ular emails, object recognition in images, document classification, are typical examples of classification.

• Regression: The outputs are continuous values and the goal is to fit a model that approximates the relation between input and output variables.

Classification and regression tasks belong to the supervised ML approach.

• Clustering: Based on some measure of similarity (such as for example the Euclidean distance), data are divided into groups. The algorithms search to maximize the distance between non-similar examples and minimize the distance between similar examples. • Dimensionality Reduction (DR): DR algorithms estimate the importance of data

fea-tures (inputs) and based on a certain criteria select the most influential once. Principal Component Analysis (PCA) is a popular DR method.

The main objective of machine learning is to generalize, considering the experience ac-quired during training to perform accurately in new and unseen tasks [38] [39]. For a good generalization, the training dataset must be considered representative of the space of occur-rences, which the machine must create a general model about this space that enables it to produce sufficiently accurate predictions in new cases. In computer science, the branch re-sponsible for the analysis and performance evaluation of machine learning algorithms is called computational learning theory. Given the fact that training datasets are finite and may not cover all the space possibilities, computational learning theory utilizes probabilistic bounds on the performance of algorithms, being bias-variance decomposition one way to quantify generalization error. In the context of generalization, it must be taken in to account the complexity of the model being trained and the complexity of the task. If the model is too complex for the task at hand it will overfit the data leading to a poor generalization. On the other hand, if the model is too simple for the task it underfits the data and generalization error will increase.

The rise of new technologies and the large amount of data generated on a daily basis by those technologies created a need for a rapid and autonomous data processing system, leading to the re-surge of machine learning. The existing evidences, provide an insight of the capabilities of machine learning in solving challenges that have been proved to be too much complex for the human, contributing this way for the advancement of several complex research fields.

This dissertation has a wide range of possibilities in terms of choice of machine learning techniques, however, it was required to define which approach would be of most importance in ensuring satisfying results. The nature of the task for this dissertation takes place in the

(29)

regression field of machine learning, therefore, a supervised learning approach along with artificial neural networks was used in order to map the input data to the desired outputs. Although, other fields of machine learning run out of the scope of this dissertation, there are some fields that present interesting alternative methods and will be mentioned as possible future work.

2.2.2 Dynamic movement primitives

Dynamic movement primitives (DMPs) are a method of trajectory planning presented by Ijspeert et al. in 2002 [40], and then updated in 2013 [3]. The work was motivated by the desire to find a way of representing complex movement behaviors on humanoid robots that was robust against perturbations and could be flexibly adjusted without concerns about instability. For some time, it has been thought that complex movement behaviors are formed by groups of primitive actions executed in sequence. DMPs were proposed as a mathematical formalization of these basic primitives in which the basic idea is to use a specific dynamical system with stable properties and add a forcing term, making it follow a desired trajectory. This forcing term is constituted by a set of adjustable weights that can be learned with statistical learning techniques.

There are two kinds of DMPs: discrete primitives based on a point attractor system and rhythmic primitives using a limit cycle. In any case, the DMPs possess unique properties that make them ideal in the generalization and generation of trajectories:

• Spatial Scaling: DMPs allows to adapt trajectories for different positions in the workspace, i.e., once the system has been set up to follow a desired trajectory to a specific goal, it is possible to move that goal in space and get a scaled version of the planned trajectory. • Temporal Scaling: DMPs are capable of following a planned trajectory at different speeds, while maintaining the same path. This means that, once it has been set up the time of arrival at the goal, it is possible to change the trajectory velocity in order to arrive at the goal at a different time.

Having a system capable of learning a desired path and adapt it to different situations is a valuable tool for imitation learning and for generalization purposes. Nowadays, there are considerable efforts on expansions to this framework, including incorporating system feed-back, exploring spatio-temporal coupling among DMPs and its integration with reinforcement learning algorithms.

(30)

2.2.3 Simulation environment using V-REP

Virtual environments provide a natural and transparent way for simulating and training real situations. The Virtual Robot Experimentation Platform (V-REP) [41] is a powerful software tool for modeling, programming and simulating robotic systems. V-REP provides an integrated development environment, in which each object is individually controlled using an embedded script, remote API clients or plugins.

V-REP can be utilized as a stand-alone application or embedded into a main client ap-plication. Its elaborate API makes V-REP a perfect contender to embed into higher-level applications, having some key functionalities described in [41] and summarized below:

• Cross-platform and portable: V-REP permits the creation of portable, scalable, and simple maintainable content by saving in a single portable file containing the full model, simulation scene and control code.

• Programming methodologies: simulator and simulations are completely adaptable, with 6 programming approaches that can even work as a unit (Remote API clients, Plugins, Add-ons, ROS Nodes, Embedded Scripts, Custom solutions).

• Remote API: with more than 100 embeddable V-REP functions,that allows to control a simulation or the system simulator itself remotely.

• Dynamics/physics engine: four physics engines (Bullet, ODE, Newton and Vortex) that can be switched at any given time according to the simulation needs, allowing to simulate real-world physics and object interactions (collision, grasping, bounce, etc.).

• Calculation modules: V-REP offers powerful calculating functionalities and computa-tion modules, including the collision-deteccomputa-tion module, the inverse kinematics module, the geometric constraint solver module, the dynamics module and the path and motion planning module.

• Sensing simulation: V-REP allows the simulation of several different sensors, such as, proximity, vision and collision sensors, among others. The proximity sensor simulation calculates the accurate minimum distance within a customizable volume. Simulation of vision sensors with a large number of available filter components that can be customized, combined and extendable to achieve the desired image processing.

• Data recording and visualization: Elaborate graphing possibilities to display in real-time simulation, with video recording option for future reference and analysis.

• Custom user interfaces: - Unlimited number of fully customizable user interface ele-ments, with integrated edit mode.

Programming outside V-REP can be done using several programming languages, such as Matlab, Java, Python, C/C++, LUA, Octave and Orbi. For this type of programming some dedicated commands and signals are used in order to send and receive data to/from V-REP. Remote API and Regular API are the types of command used along with signals to send data.

V-REP’s provides a powerful and flexible inverse kinematics (IK) calculation module, that allows the manipulation of virtually any mechanism [42]. The inverse kinematics calculation

(31)

module is of most importance to find the joint angles corresponding to some specific position and/or orientation of a given body element. In this dissertation, the body element of interest will be the end-effector of the robot arm and the IK module will be used to convert the predicted catching point into the joint-space commands.

V-REP is an important tool for this dissertation, given its capability to be controlled remotely by Matlab (Remote API) and providing a very powerful simulation environment, thanks to the provided dynamics engines (for this dissertation it was selected the Bullet Physics Engine) and calculation modules, being the inverse kinematics module of most im-portance.

2.3 Final Remarks

In the last years, the recognition of human actions and intentions are two related subjects that have caught the attention of the scientific community, namely in the field of Human-Robot-Interaction (HRI) [43] [44]. This dissertation explores a particular view into the prob-lem of robot ball-catching by addressing the importance of earlier anticipations based on the observation of the thrower’s movement (i.e., before the ball is released). This is an important aspect that, in the specific context of robot ball catching literature, has not received much attention. Indeed, this is a missing aspect in all of the above-mentioned works. The key idea behind this research is that the acquisition of perceptual-motor skills should involve not only refinement of information extraction, but also the progressive use of earlier relevant sources of information.

On the robot control side, a common approach found in the literature utilizes supervised learning for imitation of human demonstrations. The key idea in many of this works is to use human demonstrations movements learned by the robot and optimize them by resorting to a reinforcement learning approach. This dissertation follows a similar approach by utilizing dynamic movement primitives for generalization of catching motions, what may serve as foundation for possible future works applying reinforcement learning to this specific task.

(32)

(33)

Chapter 3

Neural Network Models for Early

Anticipation Skills

This chapter investigates the hypothesis that early predictions from observation of the thrower’s intention can be an essential ability for robots performing a ball catching task. For that purpose, two main questions will be addressed. The first question is focused on the required neural network model for solving the particular problem at hand. This study proposes a multi-model approach, based on feedforward neural networks, for providing an approximate mapping between the sequential observation of the thrower’s hand and the initial conditions of the ball at the start of the ballistic phase. The adopted FNNs represent a trade-off between model complexity and prediction accuracy. However, the network model accuracy does not directly relate to the the ball catching performance. Thus, the second question addressed in this chapter is the evaluation of the prediction accuracy provided by the selected FNN model in the context of a ball catching task. For that purpose, the overall catching scenario will be simulated using synthetic data and a 3-DOF robot arm.

3.1 Overall Approach

To understand the early anticipation method, it is necessary to consider two motion phases during a ball throw.

• Phase A - refers to the sequence of movements done by the hand of the thrower before the ball’s release (Throwing motion). This phase is the core of the early anticipation method, where the robot tries to predict the initial conditions of the ball(position and velocity) using the thrower’s hand motion.

• Phase B - refers to the ball’s motion after release (parabolic ball motion). During this phase the robot will use the classical method to update the prediction done during the phase A.

The state of the ball during phase A is represented as S_t(A)= {S_pos,t(A) , S_vel,t(A) }_t=0,...,T₁, where S(A)_pos,t∈ IR3and S_vel,t(A) ∈ IR3denotes the ball position and velocity, respectively, at the moment t. T1 represents the point of ball release, and therefore the transition point from phase A to phase B.

(34)

The state of the Ball during phase B follows the same logic, but without taking into account the ball’s velocity, represented as S_t(B) = {S_pos,t(B) }t=T1,...,T, where T represents the

point when the ball hits the floor.

The anticipation method pretends to use the visually acquired information of the initial movements of the thrower’s hand to predict the ball’s position and velocity at the moment of release, used afterwards to estimate the ball trajectory on air with recourse to a parabolic function approximation.

S_T(A)

1 = f (S

(A)

t=0,...,L), L < T1 (3.1)

The goal is to model an unknown function f (.), using only the first L samples of S_t(A), with recourse to supervised learning with feed-forward neural networks (FNNs).

3.1.1 Classical method

The classical method on Robot Ball Catching consists of using information acquired on phase B to predict the full trajectory of the flying ball ( S_t=T(B)

1,...,ti = S

(B)

t , where tirepresents the current time of the ball). The prediction for this method is obtained by polynomial curve fitting of the tree position components of the ball (Xball, Yball, Zball). While Xball and Yball components are approximated by straight lines (first order polynomial), the Zball component is approximated by a parabola (second order polynomial).

The classical method presents a few problems that this dissertation aims to solve. The main problem is the need of several samples of the flying ball in order to obtain a reliable prediction of its trajectory, resulting in insufficient time for the robot’s arm to move to the exact catching position. This dissertation explores the information acquired before the ball release (Phase A) to make an earlier prediction of the ball trajectory and then switch to the classical method at some point of the flying ball trajectory. Therefore two studies must be conducted, on one hand, it is necessary to study when to start moving the robot’s arm regarding only the classical method, since the first predictions obtained are quite inaccurate, it may lead the robot to make very inefficient trajectories, for instance, moving in one direction and then move in the exact opposite direction of the initial movement. On the other hand, it is fundamental to study when shall the robot switch from the prediction acquired on phase A, to the prediction acquired during phase B, which will depend on how many samples the classical method needs to get a prediction that is more accurate than the anticipation method.

(35)

3.1.2 Human motion and ball trajectory generation

The overall catching scenario considers that a human subject (throwing the ball) is placed around 3-4 m away from the robot (catching the ball). Ball throws were generated by a commonly used method, where the ball in fly is represented by a parabolic motion (3.2), neglecting the air resistance.

x = x0+ vx0t; y = y0+ vy0t; z = z0+ vz0t − 0.5gt2;

g = 9.810665 m/s2

(3.2)

The generated ball throwing motions were based on VICON captured motion during ball catch play between two human subjects. Human-like throws were generated as polynomials (in 3D space) that approximate the hand movement in terms of position, velocity and ac-celeration. The reason for these artificial motion generations is to provide sufficient data for applying supervised machine learning. Figures 3.1, 3.2 and 3.3 confirm that the artificially generated data fit well with the real hand movements and therefore can be used to study anticipation techniques on robot ball catching.

(36)

Figure 3.2: Real hand motion versus polynomial generation on Y axis.

(37)

3.2 Neural Network Design

3.2.1 Neural network topology

A feed-forward neural network (FNN) [45] is a particular type of artificial neural network (ANN) in which the information flows from the input nodes to the output nodes, passing through the hidden layers, but without any feedback connection, as it is illustrated in Figure 3.4. The FNNs have been widely used in various machine learning applications such as data classification or regression, time-series processing, computer/robot vision, autonomous car driving, etc. Their success is due to inherent FNN properties such as:

• Training - relatively simple training procedure.

• Non Linearity - able to confidently approximate non-linear input-output mapping. • Robustness - smoothly degrading performance in the presence of noise or perturbations.

Figure 3.4: Information flow on a feedforward neural network (adapted from [46]). Given the nonlinear nature of the robot ball catching task, FNN seems a promising mod-eling approach to study. In the general FNN configuration , the first L samples of the ball dynamical state during the throwing stage, i.e., the position (S(A)_pos,1...L ∈ IR3_{) and the} ball velocity (S_vel,1...L(A) ∈ IR3), are the FNN inputs. The FNN outputs are the ball position (S_pos,t=T(A) ₁ ∈ IR3) and the ball velocity (S_vel,t=T(A)

1 ∈ IR

(38)

3.2.2 Hyper-parameters

A number of hyper-parameters need to be selected, before starting the neural network training procedure. Typically, the FNN hyper-parameters, regarding the network structure, are the number of hidden layers, the number of neurons on each hidden layer, neuron activation function type and the number of network inputs and outputs. The cost function, the training function, maximum number of training iterations, number of iterations for early stopping, size of the dataset and the division of the dataset in training and validation are the hyper-parameters with respect to the training process.

Some of the parameters were set up as the most typically chosen options for non-linear regression problems. For example, the mean square error (MSE) was used as the cost function to assess the network performance. The scaled conjugate gradient (SCG) was preferred as the training function following the conclusions of [47] that SCG demonstrates above average performance over a wide variety of problems. The hyperbolic tangent sigmoid activation function was selected for the nodes in hidden layers and linear activation function for the output layer, as illustrated in Figure 3.5.

The maximum number of iterations for training was set to a sufficiently high number (5000) so that the network has a better chance to reach and stabilize around the global minimum. The over-fitting risk was handled by the early stop method, i.e., stop the training after 250 iterations of increasing validation error.

Figure 3.5: Feed-forward neural network in Matlab with a single hidden layer of 5 neurons. A Cross Validation (CV) dataset of 1000 ball throws was generated and divided into 75% for training and 25% for validation. This split of the dataset is done randomly, so each FNN trained has a different validation and training dataset. The dataset used for testing has 500 ball throws and it is the same for every FNN, with the guarantee that was never involved in the training process.

A few tests done were enough to observe that a single hidden layer FNN would suffice and it was not required more than 20 neurons to solve the problem ahead, whereas multiple hidden layers or a large amount of neurons would not improve the FNN prediction accuracy, with even in some cases deteriorating.

The number of inputs depends on the number of samples required during phase A, de-nominated previously by L. Figure 3.6 represents an initial study conducted, that establishes the relationship between the performance error of the FNN (MSE) and the number of hand samples as inputs (L). This study was conducted using a single hidden layer FNN with 10 neurons. The results shown in Figure 3.6 represent the mean performance of 20 FNNs using the same layout but with different initial conditions. The dataset used during the study had 1000 ball throws and was randomly divided in 70% for training, 15% for validation and 15% for testing, so that each trained FNN have different training, validation and testing datasets. The results show some fluctuations, however as expected the mean square error decreases

(39)

Figure 3.6: Study of the number of inputs for FNN.

with the increment of the number of samples as input. In order to choose L it is required to take into account that the earlier the FNN has a prediction the better the performance of the anticipation method, however it requires a certain accuracy, otherwise the anticipation method may have a worse performance than the classical method. Taking everything stated it was chosen a middle ground regarding the value of L and so the multidimensional input will use the first 20 samples acquired during phase A, that is, the FNN has 120 inputs.

S_1,..,20(A) = {S_pos,1,...,20(A) , S_vel,1,...,20(A) } ∈ IR120 (3.3)

3.2.3 Model validation

A variant of the widely used Monte Carlo Cross-Validation (MCCV) was implemented in this study for FNN model validation. The MCCV method consists of randomly splitting the dataset into training and validation sub-sets. Each dataset split corresponds to an experiment (Fold). The network model is trained with the training sub-set for each given fold and its prediction accuracy is checked with the validation sub-set of that fold. Figure 3.7 illustrates the MCCV data division for 3 folds (experiments). This model validation method assures that the ratio between the training and validation sub-sets is independent of the number of folds. However some parts of the data may never be selected as validation sub-sets, whereas others may be selected more than once. Such a drawback can be handled by increasing the number of folds, which increases the probability of each data sample (a single ball catching episode) to have been part of the validation sub-set in at least one of the folds.

(40)

From other side, too many folds means prohibitively long training time, therefore in this study, K=50 folds was considered as a reasonable compromise.

Figure 3.7: Monte Carlo Cross-Validation - Data randomly split into training and validation datasets, K=3 folds (adapted from [48]).

FNNs with different topologies (layout) were trained. For each layout the validation errors over 50 folds were computed and the FNN with the minimum validation error was selected. After that, the trained FNN was provided with kept aside testing data, in order to get the network test (generalization) error. This process is repeated for each FNN layout, and the final model is the one with the minimum test error.

The risk of model over-fitting is handled with the Early Stopping method, where the train-ing stops after a continuous increase of the validation error over a predetermined subsequent number of iterations (250 iterations in the present study).

(41)

3.3 Multi-Model Selection

3.3.1 Model with one feedforward neural network

In order to design an anticipation method, sufficiently robust to sensor noise, the first models were trained with a range of signal to noise ratios (from 35 dB to 45 dB). Nevertheless, during the ball catching task simulation, a sensor with SNR of 40 dB was assumed.

Figure 3.8 show the validation and test MSE of neural networks with a single hidden layer and varying number of neurons (from 20 to 20). The FNN predicts the 6-dimensional output of the anticipation method. In Table 3.1 are presented the velocity and position errors of the ball at the moment of release. Note that the prediction of the initial ball velocity is much less accurate than the initial ball position, therefore the model with the minimal velocity error was selected (FNN with 5 neurons). Reliable prediction of the ball velocity at release prove to be of paramount importance because in fact, it defines the whole ball trajectory. Small prediction errors may lead to significantly different ball trajectories. The single FNN model appears to be less suitable architecture in this context.

(42)

Neurons Position Error (m) Velocity Error (m/s) 2 1.9001E-01 5.0138E-01 3 1.0623E-01 4.8873E-01 4 4.1053E-02 4.8158E-01 5 3.5492E-02 4.7437E-01 6 4.3517E-02 4.7625E-01 7 4.2183E-02 4.7488E-01 8 6.5857E-02 4.7701E-01 9 4.7866E-02 4.7734E-01 10 4.9312E-02 4.7682E-01 11 5.8555E-02 4.7781E-01 12 6.0481E-02 4.7962E-01 13 5.3669E-02 4.7909E-01 14 5.2936E-02 4.7779E-01 15 5.8357E-02 4.7526E-01 16 5.5351E-02 4.7888E-01 17 5.1825E-02 4.7554E-01 18 5.8475E-02 4.7957E-01 19 5.5852E-02 4.7966E-01 20 4.7124E-02 4.7900E-01

(43)

3.3.2 Model with two feedforward neural network

The nonlinear regression between a multidimensional input (equation 3.4) to a six dimen-sional output (equation 3.5) proved to be inaccurate in predicting the ball trajectory, as seen in section 3.3.1, especially regarding the velocity S_vel,T(A)

1 component.

S_t=0,..,L(A) = {S_{pos,t=0,...,L}(A) , S(A)_{vel,t=0,...,L}} ∈ IR6L _(3.4)

S_T(A) 1 = {S (A) pos,T1, S (A) vel,T1} ∈ IR 6 _(3.5)

As an attempt to solve this problem we propose a two FNNs model, where one FNN predicts the ball initial position S_pos,T(A)

1 ∈ IR

3 _{and a second FNN predicts the initial velocity} S(A)_vel,T

1 ∈ IR

3_{. Figures 3.9 and 3.10 show the validation/test MSE of the FNNs for varying} number of hidden layer neurons. Table 3.2 shows the respective position/velocity test errors, with the best FNN for position prediction having 9 neurons, and the best FNN for velocity prediction having 7 neurons. Note that splitting the model in two FNNs improved the pre-diction accuracy, however the velocity component still presents a considerable error, which raises the question if further splitting would improve the results.

Neurons Position Error (m) Velocity Error (m/s)

2 1.3930E-01 4.9163E-01 3 2.4457E-02 4.7219E-01 4 2.4523E-02 4.7475E-01 5 2.4572E-02 4.7552E-01 6 2.4715E-02 4.7174E-01 7 2.4450E-02 4.7128E-01 8 2.4509E-02 4.7391E-01 9 2.4433E-02 4.7611E-01 10 2.5004E-02 4.8151E-01 11 2.4991E-02 4.7772E-01 12 2.4871E-02 4.7555E-01 13 2.5152E-02 4.7969E-01 14 2.4896E-02 4.7416E-01 15 2.5435E-02 4.7547E-01 16 2.4999E-02 4.8234E-01 17 2.5180E-02 4.8066E-01 18 2.4674E-02 4.8098E-01 19 2.4864E-02 4.7907E-01 20 2.5553E-02 4.7943E-01

(44)

Figure 3.9: Position FNN - Model Validation.

(45)

3.3.3 Model with four feedforward neural networks

Taking into account the results obtained in the previous section 3.3.2, now the anticipation mechanism consists of four FNNs. One FNN to predict the ball position (S_{P os,T}(A) ₁ ∈ IR3), and the other 3 FNNs to predict each component of the ball initial velocity (S_{V el}(A)

X,T1 ∈

IR; S_{V el}(A)

Y,T1 ∈ IR; S

(A)

V elZ,T1 ∈ IR). Figures 3.11, 3.12 and 3.13 show the model validation

of each FNN regarding the velocity component, since the FNN for position prediction was already trained. In Table 3.3 is possible to verify the significant increase in error, which can be justified by observing the validation figures that the FNNs of each component are over-fitting the data, considering the fact that the validation error is much lower than the test error.

Models Velocity X Error (m/s) Velocity Y Error (m/s) Velocity Z Error (m/s)

2 1.0134E+00 3.7153E-01 9.9627E-01

3 1.0100E+00 3.7052E-01 9.9534E-01

4 1.0062E+00 3.7160E-01 9.9672E-01

5 1.0146E+00 3.6126E-01 9.9530E-01

6 1.0110E+00 3.6747E-01 9.9568E-01

7 1.0133E+00 3.6451E-01 9.9584E-01

8 1.0142E+00 3.6741E-01 9.9582E-01

9 1.0160E+00 3.6084E-01 9.9604E-01

10 1.0101E+00 3.6560E-01 9.9660E-01

11 1.0200E+00 3.6267E-01 9.9599E-01

12 1.0131E+00 3.6303E-01 9.9681E-01

13 1.0141E+00 3.7208E-01 9.9586E-01

14 1.0162E+00 3.7132E-01 9.9586E-01

15 1.0146E+00 3.6408E-01 9.9627E-01

16 1.0069E+00 3.6535E-01 9.9666E-01

17 1.0071E+00 3.6085E-01 9.9581E-01

18 1.0197E+00 3.6498E-01 9.9593E-01

19 1.0121E+00 3.7023E-01 9.9640E-01

20 1.0050E+00 3.6726E-01 9.9605E-01

(46)

Figure 3.11: Velocity X FNN - Model Validation.

Figure 3.12: Velocity Y FNN - Model Validation.

(47)

3.4 Ball Catching Simulation

In the previous section several models archetypes for implementing the early anticipation strategy were obtained. In addition to comparing the different archetypes by their respective test errors, this section evaluates the effectiveness of the neural network model for maximizing the catching task performance. These simulations aim to give an idea on how well the neural networks perform in a simulation environment in the sense of determining if any improvements are required.

3.4.1 Simulation assumptions

The simulations make some assumptions and approximations regarding the operation of both the perception and the actuation systems. First, we assume a pure parabolic motion of the flying ball by neglecting the effect of air resistance. Second, we address the ball catching problem from a kinematics viewpoint, so that dynamic constraints and control problems are not addressed. Note that an implicit assumption is that the flying-ball and the robot’s end-effector are represented by points in the 3D space. In line with this, planning the hand orientation and closing the hand for grasping the flying-ball are two actions considered beyond the scope of this paper. Third, additive white Gaussian noise is added to the sensory signals to mimic the effects of uncertainty in the overall motion estimate. However, this study does not access how noise in actuators and delays present in all stages of the control system could limit the ability to act precisely.

The catching point for every prediction is found by calculating the nearest catching point to the current robot’s position. This is a very standard solution, where the transverse points of the predicted ball trajectory with the robot’s reachable space are calculated. The selection of the catching position is done by calculating the nearest transverse point to the current robot’s end-effector position, among all possible solutions. In the case of no reachable points, that is, the trajectory predicted for the ball does not intercept the robot’s workspace, the robot end-effector should remain in the current position or continue to move towards the last catching point obtained in any of the previous iterations.

3.4.2 Robot’s arm motion control

For the Matlab simulation, the classical 3 revolute-jointed serial manipulator (i.e., spatial 3R) is adopted. In spite of the simplicity, the structure of this manipulator is important because it captures the general design of the first three joints and links of many 6-DOF manipulators. In this study, the problem of motion generation is solved by moving the robot arm from the current state to the predicted state (catching point) as fast as possible, while satisfying actuator constraints. On one hand, it’s assumed that the robot should re-plan its motion whenever there is a new prediction lying inside the reachable space. Thus, online computations, based on the use of feedback corrections are required to allow the robot react to constant updates in the predicted catching point. On the other hand, the goal of the task is specified in terms of the end-effector position defined in the Cartesian space, while the robot arm is actuated at the joint level. Therefore, an inverse kinematics algorithm is required to map from the task-space into the joint-space.

An effective solution to this particular problem can be devised by using a well-established Jacobian-based scheme for solving the inverse kinematics [49]. An overview of the

(48)

imple-mented algorithm is as follows: first, it’s calculated the desired change in position of the end-effector as the difference between the predicted catching point, rd, and the current posi-tion of the end-effector, r, as follows:

er = rd− r (3.6)

The error vector in 3.6 provides the desired direction when trying to move the end-effector towards the target point. Multiplying the error vector by the inverse of the manipulator’s Jacobian matrix (J−1) translates into the necessary joints velocities ( ˙q) to drive the arm’s joints:

˙

q = J (q)−1er (3.7)

For the 3-dof manipulator arm, a closed-form solution to the inverse Jacobian matrix is available avoiding numerical computations. The resulting joint velocities are then checked against its limits: regardless of whether the velocity it is greater or less than the limits, a scaling is applied such that the maximum possible speeds are always used, while keeping the desired motion direction. The control of the arm is repeated, at a rate of 100 Hz, until the end-effector’s position is sufficiently closed to the target (i.e., below a threshold of 2 mm).

This solution is attractive for several reasons. First, it is no longer necessary to predict the interception time or specify a given path. Hopefully, the robot will reach the predicted target point prior to the ball’s arrival, i.e., the robot will wait for the flying ball (or, in the limit, catch it in motion). Second, this approach is well-suited for a practical implementation of an online motion generation algorithm in which the end-effector trajectory is continuously modified based on sensory information provided by the sensor. If a computationally less intensive solution is desirable, a solution based on the transpose of the Jacobian could also be devised. However, this study does not access how noise in actuators and delays present in all stages of the control system could limit the ability to act precisely.

3.4.3 Results discussion

The overall catching scenario considers that a human subject throws a ball towards the robot from a distance of about 3-4 m, illustrated in Figure 3.14. The ball trajectory is randomly generated, using a uniform distribution, such that the initial ball velocity varies between 4 m/s and 7 m/s, resulting in a flight time in the range from 0.5 s to 1.2 s. The classical 3 revolute-jointed serial manipulator (i.e., spatial 3R) is adopted in this study. In spite of the simplicity, the structure of this manipulator is important because it captures the general design of the first three joints and links of many 6-DOF manipulators. Figure 3.15 depicts the 3R spatial manipulator in the anthropomorphic posture adopted at the beginning of each trial (the best from the viewpoint of manipulability [50]). The reachable points are superimposed together with the sphere that delimits the work volume, considering the joint physical limits and the link lengths.

(49)

Figure 3.14: Matlab simulation environment.

Figure 3.15: Reachable space of the 3-DOF spatial serial manipulator (link parameters: l1 = 0.3 m, l2 = 0.3 m and l3 = 0.3 m) in the posture adopted at the beginning of each trial. Blue points represent the reachable space given joint physical limits defined by the following inequalities: q1 < 105◦, 0 < q2 < 150◦ and q3 < 180◦.

Generalization and anticipation skills for robot ball catching using supervised learning

Diogo

Carneiro

Generalization and Anticipation Skills for Robot

Ball Catching Using Supervised Learning

T´

ecnicas de Generaliza¸

c˜

ao e Antecipa¸

c˜

ao para um

Robˆ

o de Captura de Bolas Usando Aprendizagem

Supervisionada

Diogo

Carneiro

Generalization and Anticipation Skills on Robot

Ball Catching Using Supervised Learning

Contents

List of Figures

List of Tables

Chapter 1

Introduction

1.1

Motivation

1.2

Objectives

1.3

Dissertation Outline

Chapter 2

Background and Context

2.1

Literature Review

2.2

Computational Tools

2.3

Final Remarks

Chapter 3

Neural Network Models for Early

Anticipation Skills

3.1

Overall Approach

3.2

Neural Network Design

3.3

Multi-Model Selection

3.4

Ball Catching Simulation