Localização de jogadores de basquetebol e estimativa da frequência cardíaca em jogos oficiais = aplicações baseadas em aprendizado de máquina e otimização combinatória = Basketball player localization and heart rate estimation in official games: applicati

(1)

UNIVERSIDADE ESTADUAL DE CAMPINAS

Faculdade de Ciências Aplicadas

LUCAS ANTÔNIO MONEZI

LOCALIZAÇÃO DE JOGADORES DE BASQUETEBOL E ESTIMATIVA DA FREQUÊNCIA CARDÍACA EM JOGOS OFICIAIS: APLICAÇÕES BASEADAS EM

APRENDIZADO DE MÁQUINA E OTIMIZAÇÃO COMBINATÓRIA

BASKETBALL PLAYER LOCALIZATION AND HEART RATE ESTIMATION IN OFFICIAL GAMES: APPLICATIONS BASED ON MACHINE LEARNING AND

COMBINATORIAL OPTIMIZATION

Limeira 2016

(2)

LOCALIZAÇÃO DE JOGADORES DE BASQUETEBOL E ESTIMATIVA DA FREQUÊNCIA CARDÍACA EM JOGOS OFICIAIS: APLICAÇÕES BASEADAS EM

APRENDIZADO DE MÁQUINA E OTIMIZAÇÃO COMBINATÓRIA

BASKETBALL PLAYER LOCALIZATION AND HEART RATE ESTIMATION IN OFFICIAL GAMES: APPLICATIONS BASED ON MACHINE LEARNING AND

COMBINATORIAL OPTIMIZATION

Dissertaçãoapresentada à Faculdade de Ciências Aplicadas da Universidade Estadual de Campinas como parte dos requisitos exigidos para a obtenção do Título de Mestre em Ciências da Nutrição e do Esporte e Metabolismo, na Àrea de concentração Biodinâmica do Movimento Humano e Esporte

Orientador: Prof. Dr. Milton Shoiti Misuta

ESTE EXEMPLAR CORRESPONDE À VERSÃO FINAL DISSERTAÇÃO DEFENDIDA PELO ALUNO

LUCAS ANTÔNIO MONEZI, E ORIENTADA PELO PROF. DR. MILTON SHOITI MISUTA

LIMEIRA 2016

(3)

Ficha catalográfica

Universidade Estadual de Campinas Biblioteca da Faculdade de Ciências Aplicadas

Renata Eleuterio da Silva - CRB 8/9281

Informações para Biblioteca Digital

Título em outro idioma: Basketball player localization and heart rate estimation in official games: applications based on machine learning and combinatorial optimization

Palavras-chave em inglês: Sports

Neural network Kinematics

Área de concentração: Biodinâmica do Movimento Humano e Esporte Titulação: Mestre em Ciências da Nutrição e do Esporte e Metabolismo Banca examinadora:

Milton Shoiti Misuta [Orientador] Ricardo Machado Leite de Barros Paulo Roberto Pereira Santiago Data de defesa: 04-02-2016

Programa de Pós-Graduação: Ciências da Nutrição e do Esporte e Metabolismo M746L

Monezi, Lucas Antônio, 1987-

MonLocalização de jogadores de basquetebol e estimativa da frequência cardíaca em jogos oficiais: aplicações baseadas em aprendizado de máquina e otimização combinatória / Lucas Antônio Monezi. – Campinas, SP : [s.n.], 2016.

MonOrientador: Milton Shoiti Misuta.

MonDissertação (mestrado) – Universidade Estadual de Campinas, Faculdade

de Ciências Aplicadas.

Mon1. Esportes. 2. Redes neurais. 3. Cinemática. I. Misuta, Milton Shoiti,1970-. II. Universidade Estadual de Campinas. Faculdade de Ciências Aplicadas. III. Título.

(4)

Autor: Lucas Antônio Monezi

Título: Localização de jogadores de basquetebol e estimativa da frequência cardíaca em jogos oficiais: aplicações baseadas em aprendizado de máquina e otimização combinatória

Natureza: Defesa de Mestrado

Instituição: Universidade Estadual de Campinas Data da Defesa: Limeira, 04 de fevereiro de 2016.

BANCA EXAMINADORA

Prof. Dr. Milton Shoiti Misuta (Orientador)

Assinatura

Prof. Dr. Ricardo Machado Leite de Barros

Assinatura

Prof. Dr. Paulo Roberto Pereira Santiago

Assinatura

*A Ata da defesa com as respectivas assinaturas dos membros encontra-se no processo de vida acadêmica do aluno.

(5)

Eu gostaria de agradecer a todos que colaboraram, direta ou indiretamente com o desenvolvimento deste trabalho e com minha formação. Em primeiro lugar devo agradecer imensamente a minha família, a meus pais e ao meu irmão. Sem dúvida este trabalho e outros não existiria sem ajuda dessas duas pessoas, ao professor Luciano Mercadante e ao meu orientador, Milton Misuta, que sempre acreditaram em mim. O empenho e dedicação deles renderam alguns importantes trabalhos, que sempre irei agradecer por me ajudarem a realizar. Também não posso esquecer de mencionar meus agradecimentos a todo pessoal do laboratório e agregados, Natália, Juliana, Carol Cirino, Carol Pahan, Heber, Cristian, Andy, Thiagão, Yura, Gui, Rene, Fabio, Jéssica e Bruna. Aos professores, minha imensa gratidão por tudo que me ensinaram, especialmente agradeço aos professores Leonardo Duarte, Alcides, Ropelle, Rodrigo, e todos os outros que tive a oportunidade de aprender, é como eu sempre digo, “ninguém ensina nada para ninguém, mas quem quiser aprende com todos”, e eu aprendi muito com essas pessoas.

(6)

A complexidade dos esportes deve ser investigada em competições, e, portanto, métodos não-invasivos são essenciais. Desse modo, Visão Computacional, Processamento de Imagem, e Aprendizado de Máquina podem fornecer ferramentas úteis para projetar métodos não-invasivos objetivando a aquisição de dados da posição do jogador e frequência cardíaca do jogador em partidas oficiais de basquetebol. Neste trabalho, descrevemos duas aplicações importantes para a análise de basquete usando ferramentas do Aprendizado de Máquina. No primeiro estudo, propusemos e avaliamos uma nova metodologia baseada em vídeo para a localização 3D automática de multiplos jogadores de basquetebol. E o objetivo do segundo estudo foi explorar a viabilidade da utilização de uma rede neural para estimar a freqüência cardíaca com a cinemática do jogador (velocidades, distância percorrida, e, tempo jogado) e características individuais (antropométria, idade e testes físicos) como variáveis de entrada. Ambos os estudos envolvem o desenvolvimento de métodos não-invasivos para a aquisição de dados de informações relevantes dos jogadores durante os jogos. A primeira aplicação destinada foi a obtenção de um dado de baixo nível, a posição do jogador na quadra, que pode ser usada para calcular vários dados de alto nível relacionadas com os aspectos técnicos e físicos, bem como ser utilizada para a análise tática. A outra aplicação teve como objetivo estimar a frequência cardíaca, um parâmetro fisiológico relacionado com a intensidade do esforço. A medição da frequência cardíaca do jogador durante o jogo é uma tarefa difícil, pois o atleta precisa usar um dispositivo desconfortável, e também, as regras de muitos esportes não permitem que os jogadores utilizem estes dispositivos.

(7)

Sports complexity must be investigated at competitions, and therefore non-invasive methods are essential. Thereby, Computer Vision, Image Processing, and Machine Learning can provide useful tools to design non-invasive methods to the data acquisition of the player position and player heart rate in official basketball matches. In this paper, we describe two important applications to the basketball analysis using tools from Machine Learning. In the first study, we propose and evaluate a novel video-based framework to automatic 3D localization of multiple basketball players. And the aim of the second study was exploring the feasibility of using a neural network to estimate the heart rate with player kinematics (velocities, distance covered, and, time played) and individual features (anthropometric, age, and, performance tests) as inputs. Both studies involve the development of non-invasive methods to the data acquisition of relevant information of the players during games. The first application was aimed to a low-level data, player position in the court, that can be used to compute several high-low-level data related to the technical and physical activities as well as tactical analysis. The other application was aimed to estimate the heart rate, a physiological parameter related to the effort intensity. The measurement of player heart rate during the game is a difficult task, because the athlete needs to wear an uncomfortable device, and also, the rules of many sports not allow players to use these devices.

(8)

Figure 1. Framework diagram with key tasks for player detection and 3D localization. ... 17 Figure 2. Player detection process over a delimited interest area (e) with some steps depicted. The image (a) from video are used to model the background, static image of the court (b), the foreground mask (c) is the result of background subtraction task. (f) is a closed view of three head player candidates. ... 19 Figure 3. Head enclosed by Circle Hough Transform. ... 20 Figure 4. Neural network and histogram of oriented gradients representations. ... 21 Figure 5. HOG features of positive and negative samples used as inputs of the neural network. ... 21 Figure 6. Neural network confusion matrix (Matlab® confusion matrix plot model), “head” is 1 and “non-head” is 0. True classifications are denoted by green squares (a, e) and false classifications are denoted by red squares (b, d). Blue squares indicate the overall rates (correctness and error rate) of classifications (i). Gray squares (c, f, g, h) show the conditional rates (correctness and error rate), given a determined target (3rd row, g, h) or given a determined output (3rd_{column, c, f). The performance was evaluated in game 1 for each subset. ... 22}

Figure 7. Reproducibility of the classification by the neural network in an independent game. ... 23 Figure 8. (a) Rigid bar used for camera calibration, the court system reference adopted, and the calibration points (red, b) with at least two appearances. ... 24 Figure 9. Localization of 6 players by optimization (assignment as same colors) and 1 remaining point (c) localization; basketball plane court representation (d). ... 28 Figure 10. Processing time (s) for solve the 3D localization of one player according with number of points detected in the cameras. ... 30 Figure 11. Study design, player trajectory 75 s before an HR sample, and, neural network architecture. ... 38 Figure 12. Regression plot results of testing in independent game 1 when the neural network was trained with games 2,3,4 and 5. ... 41 Figure 13. Differences between an output HR curve obtained with neural network and target HR curve from real polar data. ... 41

(9)

Table 1. Notation. ... 25 Table 2. Reconstruction functions system. ... 27 Table 3. Correlation values depicted by subsets for each game exclusion, E (Gn), on neural

(10)

1. CAPÍTULO I ... 12

1.1 INTRODUÇÃO ... 12

2. CAPÍTULO II ... 14

A VIDEO-BASED METHODOLOGY FOR 3D LOCALIZATION OF BASKETBALL PLAYERS: A CONSTRAINED COMBINATORIAL OPTIMIZATION APPROACH ... 14

2.1 ABSTRACT ... 14

2.2 INTRODUCTION ... 15

2.3 PROPOSED FRAMEWORK ... 16

2.3.1 Player Detection ... 17

2.3.2 3D Reconstruction ... 23

2.4 FRAMEWORK PERFORMANCE EVALUATION ... 28

2.4.1 Player Detection ... 29

2.4.2 Player Localization ... 29

2.5 DISCUSSION ... 30

2.6 CONCLUSION ... 33

3. CAPÍTULO III ... 35

NEURAL NETWORK TO ESTIMATE THE HEART RATE DURING BASKETBALL GAMES ... 35

3.1 ABSTRACT ... 35 3.2 INTRODUCTION ... 36 3.3 METHODS ... 37 3.3.1 Data Acquisition ... 38 3.4 RESULTS ... 39 3.5 DISCUSSION ... 42 3.6 ACKNOWLEDGMENTS ... 42

4. DISCUSSÃO ... 43

5. CONCLUSÃO ... 44

6. REFERÊNCIAS ... 45

(11)

1. Capítulo I

1.1 Introdução

Os recentes avanços nas ciências do esporte estão bastante associados ao avanço tecnológico. Nos dias de hoje, sistemas computadorizados baseados em vídeo podem ser utilizados para obter informações do desempenho da equipe ou do jogador durante o jogo. As pesquisas em ciências do esporte apontam que a análise da dinâmica de uma modalidade necessita ser analisada em competição, assim, o desenvolvimento de métodos não invasivos são de extrema importância para não interferirem no desempenho dos atletas. Nesse sentido, ferramentas de visão computacional, processamento de imagens e aprendizado de máquina podem ser integradas objetivando o desenvolvimento de métodos não invasivos para a análise da dinâmica de esportes coletivos durante a competição. O presente trabalho descreve duas aplicações envolvendo ferramentas do aprendizado de máquina, visão computacional e processamento de imagens para a obtenção de dados pertinentes a modalidade do basquetebol. A primeira aplicação aborda uma metodologia baseada em vídeo para localização 3D de múltiplos jogadores de basquetebol, e será apresentada no capítulo II. A localização dos jogadores na quadra é uma informação base para aplicações como o rastreamento de jogadores, que fornece parâmetros relacionados ao desempenho físico, e para a análise tática. No capítulo III apresentaremos a segunda aplicação que explora as redes neurais para estimar a frequência cardíaca a partir de variáveis cinemática relativas ao jogador durante um jogo de basquetebol. A medição da frequência cardíaca durante o jogo é uma tarefa difícil, pois o atleta necessita utilizar um monitor cardíaco preso ao corpo que muitas vezes não é permitido pelas regras e além disso o atleta resiste ao uso por entender que limita sua movimentação. Com isso, a estimativa da frequência cardíaca utilizando uma rede neural com dados cinemáticos obtidos por método baseado em vídeo pode ser uma solução metodológica visando a obtenção de um parâmetro fisiológico vital para a análise de desempenho.

Os trabalhos aqui apresentados trataram de problemas que objetivam a quantificação de dados referentes aos jogadores durante jogos oficiais. A abordagem aos problemas conta com métodos de aprendizado supervisionado. Ambas aplicações utilizam

(12)

redes neurais, a primeira como uma parte na etapa de detecção do jogador, enquanto que na segunda a rede neural é explorada para aprender a relação entre um esforço realizado (cinemática) e seu efeito no sistema cardiorrespiratório (frequência cardíaca). Em questão a rede neural na primeira aplicação é utilizada para classificação binária (aprovar ou rejeitar uma cabeça de jogador) e na segunda para regressão não linear (estimativa da frequência cardíaca do jogador).

(13)

2. Capítulo II

A video-based methodology for 3D localization of basketball players: a

constrained combinatorial optimization approach

2.1 Abstract

Sports complexity must be investigated at competitions, and therefore non-invasive methods are essential. Thereby, Computer Vision, Image Processing, and Machine Learning can provide useful tools to design a non-invasive system to the data acquisition of the player position in official basketball matches. We here propose and evaluate a novel video-based framework to automatic 3D localization of multiple basketball players. The introduced framework comprises has two parts. The first one is a stage of player detection aiming to identify the players head at camera image level. This stage is based on background segmentation and on classification performed by an artificial neural network. The second stage is related to 3D reconstruction of the player positions from the images provided by the different cameras used in the acquisition. This stage is tackled by formulating a constrained combinatorial optimization problem that minimizes the re-projection error while maximizing the number of detections in the formulated 3D localization problem.

Keywords: background segmentation, player segmentation, player detection, neural network, combinatory optimization, greed heuristic, basketball.

(14)

2.2 Introduction

Recent advances in sports science have been possible due to development of proper technology. For instance, computer-aided systems can be applied in several sports to obtain both high and low-level data about the performance of player or teams. In basketball, a typical example of low-level data is the player position in the court – such quantity contains important information as it can be used to compute several high-level data related to the technical and physical activities as well as tactical analysis. As claimed by the sports science literature (HOPKINS; HAWLEY; BURKE, 1999; MCGARRY et al., 2002), sports complexity must be analyzed at competitions, which means that non-invasive methods should be preferred to acquire the data such as player position; in this respect, the fields of computer vision, image processing, and machine learning play a role as they provide useful tools to design non-invasive video-based systems for data acquisition about players motion in official basketball matches.

In the last two decades, important contribution to individual and team sports analysis have been achieved through the development of video-based computer aided systems (BARROS et al., 2007, 2011; FIGUEROA; LEITE; BARROS, 2006a; GOMEZ et al., 2014; INTILLE; BOBICK, 1995; IWASE; SAITO, 2004; MORAIS et al., 2014). Regarding the team sports, the studies focus on player (BARROS et al., 2011; FIGUEROA; LEITE; BARROS, 2006a; MORAIS et al., 2014) and ball (OWENS; HARRIS; STENNETT, 2003; SPAGNOLO et al., 2013) tracking. In the task of object tracking, in which one is interested in obtaining the trajectory as a function of time, a previous step is required in order to detect the object of interest. Therefore, tracking by video-based methods necessarily comprises object detection and location on the scene, which are then used for linking the desired objects with their trajectories.

The desired objects in team sports applications for game dynamics evaluation are the players, the referees, and the ball. As reported in sports science literature (BARROS et al., 2007, 2011; FIGUEROA; LEITE; BARROS, 2006a; MORAIS et al., 2014), 2D approaches have been used in video-based applications for player or referee detection, localization, and tracking. In the 2D approach, only two spatial coordinates are taken into account. It is also possible to consider three spatial coordinates (3D approach). Surprisingly enough, 3D methods

(15)

have mainly been considered for ball tracking applications (OHNO; MIURA; SHIRAI, 2000; OWENS; HARRIS; STENNETT, 2003; POLIAKOV et al., 2010). However, the vertical component of player position is an essential information in basketball analysis, since the players frequently jump during the game.

Player detection in basketball is not an easy task; artifacts such as player occlusion, strong shadow casted by the players and abrupt reflection of the polish floor affect significantly the segmentation process (ALAHI et al., 2009). The midpoint between the feet or bottom center of player bounding box have been used as the reference point that determines the player position on the image and allows the reconstruction of the 2D player position on the court (BARROS et al., 2011; FIGUEROA; LEITE; BARROS, 2006a; IWASE; SAITO, 2004; LU; OKUMA; LITTLE, 2009). However, the use of the midpoint between the feet as a reference point may lead to problems in the segmentation stage, especially when the legs present the same color pattern of the basketball court. Since having low errors in player position is a key point in tracking algorithms, other reference points must be envisaged. A possible candidate in this respect is the player head, which is less affected by the aforementioned artifacts. In fact, the player head shape, color and size provide more stable and invariant features. Moreover, choosing the head that lie on court space as reference point for locating the player and detecting the head in multiple cameras is conforming to a 3D reconstruction approach.

Having in mind all the limitations and requirements discussed above, we propose in this paper a video-based framework for automatic 3D localization of multiple basketball players. Our contribution is organized as follows. In Section 2.3, we present the two stages of our proposal: i) player detection and ii) 3D reconstruction. Then, in Section 2.4, we provide a set of numerical experimental in order to assess the performance of our proposal. In Section 2.5, we conduct a discussion on the results, and, in Section 2.6, our paper is closed with our conclusions.

Approval for video data collection was obtained from the Brazilian National Basketball League and Limeira Basketball Association.

(16)

The proposed framework, which is summarized in Figure 1, comprises two main parts. The first one deals with the identification of the players head at camera image level. As will be detailed in Section 2.3.1, player detection is basically conducted after acquisition and requires image processing and machine learning procedures. The second stage of our proposal concerns the 3D reconstruction of player positions. This task, as will be discussed in Section 2.3.2, can be addressed by formulating a combinatorial optimization problem.

Figure 1. Framework diagram with key tasks for player detection and 3D localization.

2.3.1 Player Detection

Image acquisition is the first procedure required for player detection (Figure 1). To accomplish this procedure, a dedicated capturing program was built (Vimba SDK, OpenCV, C/C++) for recording and synchronizing videos from multi-view cameras directly onto a notebook. The video data (1038 x 776, 5 Hz) used in this work was acquired using three static industrial firewire cameras (Allied Vision Technologies GmbH©, with 6 mm lens) attached inside of a protected cage at highest place possible in the gym (~12 m from the ground). In order to extend the firewire connection of the cameras and to achieve the right place for framing

(17)

the court, a converter adapter was plugged to each camera using optical fiber (Gefen© Firewire 1394 400/800 Extender). Since aspherical lenses (c-mount, 6 mm) were used, it was necessary to perform a distortion correction the entire image. The correction protocol involved a chessboard (planar pattern) which was moved to take images at different orientations, so a closed-form solution was obtained and refined for modeling the radial distortion (ZHANG, 2000).

Player detection is based firstly on segmentation for separating the parts of the image; in case of sports applications, the parts can be the court or playfield, players, and, balls/implements. Some tools from image processing were used to perform the segmentation of the basketball players. The basic idea was to separate the static image regions from the moving regions, performing background segmentation to extract the basketball court while keeping the players (Figure 2). The Gaussian Mixture-based Background/Foreground Segmentation Algorithm was used (ZIVKOVIC, 2004) . With the background model, we can also detect the shadows and remark the shadow pixels labelled as foreground to background label (PRATI et al., 2003). Finally, a noise suppression with image processing techniques relying on morphological filtering (erosion followed by dilation) were applied in the foreground-mask (FIGUEROA; LEITE; BARROS, 2006b).

The next step is to identify the players’ head. In that respect, the first task is to estimate the contour of a player in the binary foreground image (SUZUKI; BE, 1985). Due to the high number of player by area, the contours found contained frequently more than one player. The highest point in contour curve may be directly related to a player head if only the contour contains just one player inside (Figure 2.f, player from left), case the contour curve encloses multiple players (Figure 2.f, two players from right), the highest point identify just one player head. Therefore, in order to skip this issue, the places of the global maximum as well as of the local maxima are taken into account to search a circular pattern related to a player head. Thus, a circle was fitted in the grayscale foreground mask near to the places of maxima, and the Circle Hough Transform (YUEN et al., 1989) was adopted to obtain the best circle (with proper head size) not too far away from the original local maxima (Figure 3).

(18)

Figure 2. Player detection process over a delimited interest area (e) with some steps depicted. The image (a) from video are used to model the background, static image of the court (b), the foreground mask (c) is the result of background subtraction task. (f) is a closed view of three head player candidates.

A classification into “head” or “non-head” is the ultimate part of player detection, so the candidate points were classified by a multilayer perceptron neural network, which had been trained previously. The features used were the Histogram of Oriented Gradients, HOG, (DALAL; TRIGGS, 2005) of a fixed squared region around the candidate player point (Figure 3). The circles center obtained are considered as the candidate points; these points could be representing a stand arm (N1, Figure 5), ball (N4, Figure 5), or any other non-head body segment. The candidate points are only analysed if they appear inside of the interest area (polygon pre-determined, Figure 2), thus, the high variability by spectators and objects in areas out of the court does not affect the detection process.

(19)

Figure 3. Head enclosed by Circle Hough Transform.

The chosen architecture was a multilayer perceptron feed-forward network with 10 hidden neurons in one hidden layer, and, the training of the neural network was done with a back-propagation algorithm (RUMELHART; HINTON; WILLIAMS, 1986). Concerning the classifier training and testing, building, we selected a total of 30,009 labelled samples (Figure 5) and analyzed if the HOG features and neural network classification are suitable. The samples, collected from an official game (Game 1), were divided into three subsets: training (70%), validation (15%), and test (15%). Finally, the reproducibility was checked in another independent game (Game 2 where the number of samples was 2027) with samples acquired in a scenario in which there were players wearing different jerseys (for the visiting team) and players who never appeared in the dataset of 30,009 samples. The neural network input layer had 1764 neurons (Figure 4), which correspond to the values from the histogram of oriented gradients features (DALAL; TRIGGS, 2005).

(20)

Figure 4. Neural network and histogram of oriented gradients representations.

The performance obtained by neural network for the 30,009 samples in Game 1 is illustrated through a confusion matrix in Figure 6. The confusion matrix depicts the occurrences of true classification (“head” classified as “head”, or, “non-head” classified as “non-head”), false positive classification (“non-head” classified as “head”) and false negative classification (“head” classified as “non-head”). Positive classification was considered if only the head appears centralized in the square region that HOG features is computed (e.g. In N3, Figure 5, the head is not centralized). Finally, it is worth noticing that the values presented in Figure 5 are the rates of the neural network classification and does not correspond to the rates of the player detection stage.

(21)

Figure 6. Neural network confusion matrix (Matlab® confusion matrix plot model), “head” is 1 and “non-head” is 0. True classifications are denoted by green squares (a, e) and false classifications are denoted by red squares (b, d). Blue squares indicate the overall rates (correctness and error rate) of classifications (i). Gray squares (c, f, g, h) show the conditional rates (correctness and error rate), given a determined target (3rd_{row, g, h) or given a determined output (3}rd_{column, c, f). The performance was}

evaluated in game 1 for each subset.

In Figure 7, we present the results obtained for the Game 2 (2027 samples). Note that, despite the fact that the neural network was trained considering samples from Game 1, satisfactory results were also obtained for Game 2.

(22)

Figure 7. Reproducibility of the classification by the neural network in an independent game.

2.3.2 3D Reconstruction

A previous step for an 3D reconstruction of a given point is the camera calibration task. The camera calibration aimed to estimate the parameters of each camera for leading the image coordinates of the player reference point (head in this issue) to global coordinates associated with court dimensions. After image distortion correction, a direct linear transform, DLT (ABDEL-AZIZ; KARARA, 1971), was adopted to 3D camera calibration and player reconstruction (QINGCHAO et al., 1996; ROSSI et al., 2013).

In the calibration procedure, the line intersections on the basketball plane court were chosen as references spots, and the intersections measures (2D positions) were obtained with the official FIBA rules manual. The origin of the global system was defined at the intersection of one of the lateral lines (X-axis) with one of the bottom lines (Y-axis). So, for any spot chosen on the court plane, we put a rigid bar vertically oriented (spirit level) with demarcations along its length with known measures. In these demarcations, a white Styrofoam ball (diameter ~15 cm) was fixed to a best visualization in camera images (Figure 8). Having measured some points in the image, with knowing their real coordinates, it becomes possible to solve the system built with equations (1) and (2) for estimate the eleven DLT parameters. Eleven unknown variables require eleven or more equations, this means that a minimum of six points pairs between image and real measures is necessary, due to a point pair provide two equations. (ABDEL-AZIZ; KARARA, 1971; WOOD; MARSHALL, 1986). Re-projection error of the calibration points with at least two appearances (36 points, Figure 8.b) were on average 0.026 m (X-axis), 0.031 m (Y-axis), and 0.043 m (Z-axis).

(23)

Figure 8. (a) Rigid bar used for camera calibration, the court system reference adopted, and the calibration points (red, b) with at least two appearances.

(𝑛1_𝑘− 𝑛3_𝑘. 𝑥_𝑝𝑘) 𝑋 + (𝑛4_𝑘− 𝑛6_𝑘. 𝑥_𝑝𝑘) 𝑌 + (𝑛7_𝑘− 𝑛9_𝑘. 𝑥_𝑝_𝑘)𝑍 + 𝑛10_𝑘− 𝑥_𝑝𝑘 = 0 (1)

(𝑛2_𝑘− 𝑛3_𝑘. 𝑦_𝑝𝑘) 𝑋 + (𝑛5_𝑘− 𝑛6_𝑘. 𝑦_𝑝𝑘) 𝑌 + (𝑛8_𝑘− 𝑛9_𝑘. 𝑦_𝑝_𝑘)𝑍 + 𝑛11_𝑘− 𝑦_𝑝𝑘 = 0 (2)

Once we have obtained the parameters, it was then possible to reconstruct from at least two pairs of image coordinates to the X, Y and Z coordinates in court-space.

The proposed 3D reconstruction process has, as goal, to estimate the 3D localization of all the players in the scene and is built upon a constrained combinatorial optimization problem. The underlying decision problem, which is to assign the points detected to a true player, can be modeled by the following assignment matrix

𝐴𝑠𝑠𝑖𝑔𝑚𝑒𝑛𝑡 𝑀𝑎𝑡𝑟𝑖𝑥 [𝐴_𝑚𝑥𝑛] = [

𝐴11 ⋯ 𝐴1𝑛

⋮ ⋱ ⋮

𝐴_𝑚1 ⋯ 𝐴_𝑚𝑛

], 𝐴_𝑝𝑙 ∈ [0,1] (3)

where Apl is a binary (decision) variable that takes 1 when an indexed image point from head

player detection (p) is related to a labeled player (l) – otherwise this value is 0.

In order to locate a given player on the court space, a reconstruction must be performed by considering only the points that represent the given player. The problem here is that we do not know the labels of the detected points, that is, the association between detected points and players are unknown. A possible solution to that problem would be to test all possible

(24)

combinations, in a search of the combination that minimizes the re-projection error. Unfortunately, this is a combinatorial optimization problem that may be extremely costly in terms of computational burden. Moreover, another difficulty here is that the number of players in the scene is unknown.

Thus, in order to estimate the assignment matrix A, we propose a constructive greedy solution that initially locates a single player. Having located the first player, the image points related to the head of the located player are drop out of the next interaction. This heuristic drasticallydecreases the number of required calculations. The 3D localization of a new player stops when no more feasible solutions are available, and if head player image points are not assign yet, the method takes into account a priori information (Z equal to the mean player height) to locate the last players in the court.

Table 1. Notation.

𝑛 Number of players

𝑚 Number of points detected as head player

𝑝 Index of the point

𝑙 Index of player label

𝑥_𝑙𝑘, 𝑦_𝑙𝑘 Player l re-projected at camera k image coordinates

𝑤 Total number of cameras

𝑘 Camera Index

𝑋_𝑙 , 𝑌_𝑙 , 𝑍_𝑙 Space court coordinates of player l

(25)

Let us detail our approach (Table 1 presents the notation considered herein). Basically, we search for optimizing two cost functions: i) the minimization of the sum of the re-projection errors associated with the assigned points, which is mathematically given by:

𝑚𝑖𝑛 ∑_𝑙=1𝑛 ∑𝑚_𝑝=1{[ (𝑥_𝑝𝑘− 𝑥_𝑙𝑘)2+ (𝑦_𝑝𝑘− 𝑦_𝑙𝑘)2 ] 𝐴_𝑝𝑙} (4)

and ii) the maximization of the number of the assigned points, which is given by:

𝑚𝑎𝑥 ∑𝑛𝑙=1∑𝑚𝑝=1𝐴𝑝𝑙 (5)

The rationale behind this cost function comes from notion that the more the number of designated image points, the better the approximation of the player localizations – of course, this is not the case for outlier points, which requires additional constraints to prevents their designation.

Note that the cost functions expressed in Equations (4) e (5) are conflicting, as the greater the number of designated the players the greater is the re-projected error. In view of this fact, we propose to merge these cost functions into a single one, given by:

𝑚𝑖𝑛 𝑓([𝐴]) = ∑ [(𝑥𝑝𝑘−𝑥𝑙𝑘) 2 +(𝑦_𝑝𝑘−𝑦_𝑙𝑘)2 (∑𝑚𝑝=1𝐴𝑝𝑙) 2 − ∑𝑚𝑝=1𝐴𝑝𝑙] 𝐴𝑝𝑙 𝑚 𝑝=1 (6)

For the greedy solution, we solve the 1st round (l = 1) to obtain the location of the first player, and then proceed to the next rounds (2nd, 3rd, …) by discarding the image points that were already designed.

The minimization of (6) must be conducted by considering the following set of constraints:

∑𝑛_𝑙=1𝐴_𝑝𝑙 ≤ 1 ∀ 𝑝 (7)

∑𝑚_𝑝=1𝐴_𝑝𝑙 ≤ 𝑤 ∀ 𝑙 (8)

(26)

𝐴_𝑝𝑙[ (𝑥_𝑝𝑘− 𝑥_𝑙𝑘)2+ (𝑦_𝑝𝑘− 𝑦_𝑙𝑘)2] ≤ 𝜀 ∀ 𝑝 (10)

ℎ1 ≤ 𝑍𝑙 ≤ ℎ2 (11)

Constraint (7) means that, for every point p, only one player j can be assigned. Constraint (8) means that, for every player, the number of points assigned must be equal or less than number of the cameras. Constraint (9) means that, for every player, the number of points assigned must be equal or greater than two (necessary in 3D reconstruction). Finally, (10) sets the maximum re-projection error in pixel tolerance and (11) imposes the head height limits.

Having estimated the assignment matrix A, the reconstruction of the 3D positions of each player can be obtained by solving a set of algebraic equations, as shown in Table 2.

Table 2. Reconstruction functions system. p = 1 [(𝑛1𝑘− 𝑛3𝑘. 𝑥_1𝑘)𝑋𝑙+ (𝑛4𝑘− 𝑛6𝑘. 𝑥_1𝑘)𝑌𝑙+ (𝑛7𝑘− 𝑛9𝑘. 𝑥1𝑘)𝑍𝑙+ 𝑛10𝑘− 𝑥1𝑘]. 𝐴1𝑙= 0 [(𝑛2𝑘− 𝑛3𝑘. 𝑦_1𝑘)𝑋𝑙+ (𝑛5𝑘− 𝑛6𝑘. 𝑦_1𝑘)𝑌𝑙+ (𝑛8𝑘− 𝑛9𝑘. 𝑦1𝑘)𝑍𝑙+ 𝑛11𝑘− 𝑦1𝑘]. 𝐴1𝑙= 0 p = 2 [ (𝑛1𝑘− 𝑛3𝑘. 𝑥_2𝑘)𝑋𝑙+ (𝑛4𝑘− 𝑛6𝑘. 𝑥_2𝑘)𝑌𝑙+ (𝑛7𝑘− 𝑛9𝑘. 𝑥2𝑘)𝑍𝑙+ 𝑛10𝑘− 𝑥2𝑘]. 𝐴2𝑙= 0 [ (𝑛2𝑘− 𝑛3𝑘. 𝑦_2𝑘)𝑋𝑙+ (𝑛5𝑘− 𝑛6𝑘. 𝑦_2𝑘)𝑌𝑙+ (𝑛8𝑘− 𝑛9𝑘. 𝑦2𝑘)𝑍𝑙+ 𝑛11𝑘− 𝑦2𝑘]. 𝐴2𝑙= 0 . . . . . . p = m [ (𝑛1𝑘− 𝑛3𝑘. 𝑥_𝑚𝑘)𝑋𝑙+ (𝑛4𝑘− 𝑛6𝑘. 𝑥_𝑚𝑘)𝑌𝑙+ (𝑛7𝑘− 𝑛9𝑘. 𝑥_𝑚𝑘)𝑍𝑙+ 𝑛10𝑘− 𝑥_𝑚𝑘]. 𝐴𝑚𝑙 = 0 [ (𝑛2𝑘− 𝑛3𝑘. 𝑦_𝑚𝑘)𝑋𝑙+ (𝑛5𝑘− 𝑛6𝑘. 𝑦_𝑚𝑘)𝑌𝑙+ (𝑛8𝑘− 𝑛9𝑘. 𝑦_𝑚𝑘)𝑍𝑙+ 𝑛11𝑘− 𝑦_𝑚𝑘]. 𝐴𝑚𝑙 = 0

As already mentioned, due to the limitation in the number of cameras along with the requirement of at least two points from a different view for 3D reconstruction, some detected points are not assigned as players and, thus, are not located in the court space. In these cases, we used a priori information to solve a DLT with a fixed mean height. Figure 9 describes an example of 3D localization result with the proposed framework, the player head detection (red asterisk) and assignments (colored circles) were obtained for a given frame as example, the first player localized is showed in blue circle, the second one has circle green, and, the third is represented by a red circle, are the top optimization. For this example, 7 players were 3D

(27)

localized by optimization and 1 player was located using a priori information (a remaining point detected just in camera 3) in a position near to the middle court (diamond white/red).

(a) Camera 1 (b) Camera 2

(c) Camera 3 (d) Basketball Court

Figure 9. Localization of 6 players by optimization (assignment as same colors) and 1 remaining point (c) localization; basketball plane court representation (d).

2.4 Framework Performance Evaluation

The player detection accuracy was compared against a manual measurement (ground truth) performed via the Dvideo (Campinas, SP, BRAZIL) system (BARROS et al., 2007; FIGUEROA; LEITE; BARROS, 2006a) by an expertise operator (5 years of experience).

(28)

The detection rates at each camera were calculated considering a true detection when the pixel distance between player head detection and the ground truth is less than 25 pixels. The pixel distance above of 25 pixels are considered as false positive detection. A misdetection was considered if no point was found near to a manual measure. The measures were only performed inside of the interest area (polygon pre-determined). The player localization performance was evaluated as well, thereby, the real distance in meters was evaluated between the player reconstructed by the proposed framework and the expertise manual measure in Dvideo.

2.4.1 Player Detection

The results of 10,164 detections are presented as follows: for camera 1, 2, and 3, respectively, the true detection rates were 78.9%, 68.9%, and 79.8%, the false positive rates were 2%, 1.2%, and 0.5%, and the misdetection rates were 19.1%, 29.9%, and 19.7%. For true detections, the root mean squared error of detection was 6.59 pixels.

2.4.2 Player Localization

Computation of error in player localization only accounted for the players present in at least two cameras (2941 samples). Signed median error for court space coordinates were: for X-axis -0.02 m (Q1 - low quartile, -0.04 m, and, Q3 – upper quartile, +0.17), for Y-axis +0.01 m (Q1, -0.01 m, and, Q3, +0.03 m), and, for Z-axis +0.05 m (Q1, +0.03 m, and, Q3, +0.08 m). The median absolute errors were: for X-axis 0.03 m, for Y-axis 0.02 m, and, for Z-axis 0.06 m. The root mean squared error in plane court (X and Y, Z-axis) was 0.16 m. The median error of the Euclidian distance in space court was 0.08 m (Q1, 0.06 m, and, Q3, 0.10 m). The root mean squared error in space court (X, Y and Z, axis) was 0.18 m.

For the remaining (917 samples) point issues that were not assigned, the errors are shown as follows: The signed median error for court space coordinates were +0.002 m (X-axis), +0.01 m (Y-axis), and +0.05 m (Z-axis). The median absolute errors were 0.17 m (X-axis), 0.10

(29)

m (Y-axis), and 0.08 m (Z-axis). The root mean squared error in plane court (X and Y, axis) was 0.30 m. The root mean squared error in space court (X, Y and Z, axis) was 0.33 m.

The processing time for the players’ assignment by combinatorial optimization grows exponentially according to the head player detections quantity. A time quantification was computed in the preliminary Matlab® code (not parallelized) and depicted in Figure 10. This measures account the 3D localization of the just one player, however, the localization of the next player in the optimization problem is taken with at least two point less.

Figure 10. Processing time (s) for solve the 3D localization of one player according with number of points detected in the cameras.

2.5 Discussion

The framework presented in previous section and their perfomance will be discussed in current section in detail. Inherent in the video-based tracking methods, the step of player detection and localization plays an important role in tracking. So, we will discuss how the tracking methods reported in the literature and our proposal deals with player detection and

(30)

localization, and, how they are performed in determination of the player position compared with our approach.

Systems for data acquisition in sports must be reliable and feasible, so, a complete and automatic solution to measuring the player position on the basketball court do not be achieved using the knowledge from just a single field. Thus, integrate tools from different fields was necessary for our approach to achieve the players’ localization. Looking the risen presence in player tracking works with integration of methods from several fields, we seen that integration corroborates to go beyond of knowledge frontiers. Image processing mixed with graph representation (FIGUEROA; LEITE; BARROS, 2006a), adabost detection mixed with particle filter (LU; OKUMA; LITTLE, 2009; MORAIS et al., 2014), adabost detection mixed with graph representation (BARROS et al., 2011), and, image processing mixed with clustering (CHEN et al., 2012), are some integration examples in the literature.

Mixing together techniques from image processing, machine learning, computer vision and optimization was vital for localization of multiple basketball player on court. With the pieces in correct places we faced innumerous difficult tasks in basketball player localization, such as our approach to seek a head pattern with a circle hough transform, or couple a classification by neural network to reject non-head points, or even using the optimization for choose the best assignment.

State of art video-based methods for detection and player tracking in team sports take account 2D position, however, in context of the basketball analysis, kinematic variables considering the vertical component of player position is essential, since the players frequently jump during the game. So, our principal contribution remaining in considering an 3D position of a given reference point of the player.

Instead of using a bottom center of bounding box or silhouette, that tries to represent the foot, the head of the player was the chosen reference point, and plays a key role in our framework, the reason was two-fold: to perform a 3D reconstruction with a point that lie on space court (Z ≠ 0), and, to deal with too many occlusions. The player head point was more stable, being robust to occlusions and other effects from illumination across the court.

Our proposed framework comprises two main parts: i) the detection of the players head at camera image, and, ii) the 3D reconstruction of players’ position). Starting with the first part, detection, we will present the accuracy performances found in the literature. So, then we

(31)

continue with the second part exposing the accuracy of the methods in estimation of player position.

The player detection rate was around 71%. In other works presented considering indoor team sports, we found the detection rates of: 74% in a handball study when applied to a different game of the training game (BARROS et al., 2011), 70.5% in a basketball study (IVANKOVIC et al., 2012), and, 73% in another basketball study (DELANNAY; DANHIER; DE VLEESCHOUWER, 2009). Still in basketball, the performance reported by detection approach with a mixed network of planar and omnidirectional cameras achieved the recall of 0.76 and precision of 0.72 (ALAHI et al., 2009).

The works on outdoor team sports have also considered evaluation of player detection. Experimental results from a soccer study reported 81.50% and 78.03% detection rates by two methods for player detection based on Neural Network and Viola and Jones adaboost, respectively (LEHUGER; DUFFNER; GARCIA, [s.d.]). A method for automatic tracking soccer player can automatically located players in 94% of videos frames (BARROS et al., 2007). Despite of studies focused automatic detection and tracking on outdoor team sports used several approaches also presented in indoor applications, the cameras setup (quantity, resolution, view point) and problems to face are slight different, such as the number of players to be detected, the environment features, and, the spatial organization of the players, not allowing a raw comparison of our results in basketball.

The median error near to 10 pixels in head player image position determination seens to be appropriate due to our image resolution of 1038 x 776 pixels. The average RMS errors value near to 3.4 pixels was found in a work target to indoor-sports applications (handball and basketball) with image size of 348 x 288 pixels obtained from gym ceiling cameras (KRISTAN et al., 2009). The average error in hockey tracking accuracy for determination of foot position on the image was 20% of width of the ground-truth box, the work not presented the error in terms of pixels (LU; OKUMA; LITTLE, 2009).

Proceed with discussion on the accuracy in determination of position on court, even with a limited number of cameras (three cameras), it was possible through of proposed framework to detect and 3D localize multiple players in basketball. The root mean squared 3D error of 0.18 m was suitable considering the player localization on the basketball half court (14 m x 15 m). Moreover, the changes in values of optimization constraints parameters and adding cameras can decrease the errors. To understand the error results, the average cumulative error

(32)

of 0.60 m in 2D approach trajectory presented by MORAIS et al., (2014), used multiple camera methodology developed for futsal (20 m x 40 m), and with the errors attenuated by fusion adabost (VIOLA; JONES, 2001) detections from four cameras and using player appearance model. The mean value of 0.20 m associated to the uncertainties of visible court points position, and not to player position error, is presented by BARROS et al., (2011) referred to a handball tracking study. Experiments showed a RMS error in player position of 0.28 m near to optical axis and 0.36 m in court boundary for handball players tracking using ceiling cameras (PERVS; BON; KOVAVCIVC, 2001). For a automatic tracking soccer study a spatial resolution of 0.3 m was found (BARROS et al., 2007). And, a study focused on automatic tracking of indoor 5-a-side football (18 m x 32 m) achieve the mean error in positions to 1.16 m, and a modal value to below 40 cm compared with manual tracking (NEEDHAM; BOYLE, 2001).

In our approach we attack the problem of multiple player localization in basketball by a video-based framework, another alternative to measure the player position is the global positioning system (GPS), however, the errors are too large even outdoor (GRAY et al., 2010), making the indoor use impracticable. To give an idea, around to 50% of GPS coordinates were within 2.5 m in a static position test (MOHAMAD; ALI; ISMAIL, 2009). Moreover, often the GPS not work in basketball gyms, and also, the rules of many sports not allow players to use GPS devices.

Since no temporal information was used in our proposed framework, the results can be also improved using the player trajectory to predict the current position, and also filtering the player trajectory discarding outlier positions. However, the temporal players’ detections linking is not the aim of this study and can be investigated in future studies.

2.6 Conclusion

A video-based framework for automatic 3D localization of multiple basketball players was described in the context of official games. The player detection based on image processing techniques and classification problem presented satisfactory results considering the basketball game complexity. In this way, the classification was essential for proper rejecting head candidate points (for example, other body parts like a stand arm).

(33)

A combinatory optimization problem was solved with a greedy heuristic with satisfactory results for player position accuracy and in the determination of the quantity of players in the scene. The 3D player position in the court is crucial for basketball performance analysis due to the nature of this sport. This work helps to further systems development aiming to gather 3D player position data in competition, and, the application can be extended to the others indoor team sports that vertical component is relevant.

(34)

3. Capítulo III

Neural Network to Estimate the Heart Rate During Basketball Games

3.1 Abstract

Heart Rate (HR) has been related to exercise intensity, however, intermittent exercises do not allow classical examination between a workload and a physiological response, such as the HR. The player motion described as the velocity during the game could represent the intensity of the effort, but the relationship with the HR is not a linear issue. In official games, the measurement of player HR is a difficult task, because the athlete needs to wear an uncomfortable portable device affixed near the chest. Despite of these issues in HR acquisition, the player velocity and several kinematic variables can be obtained without break any sport rule by non-invasive video-based player tracking methods. Thus, the purpose of this study was exploring the feasibility of using a neural network to estimate the HR with kinematics (velocities, distance covered, and, time played) and individual features (anthropometric, age, and, performance tests) as inputs. The specific objective was to evaluate the regression using the trained neural network into a different game. The results of modeling with the preliminary data explored in this study showed a feature lack for estimate the HR, perhaps, due to the high variability of the HR that may be related to several factors.

(35)

3.2 Introduction

Heart Rate (HR) has been related to exercise intensity; several studies reported the relationship of HR with continuous, endurance and isometric exercises (MARINO et al., 2002; WILES et al., 2008). However, intermittent exercises do not allow classical examination between a workload and a physiological response, such as the HR. The player motion described as the velocity during the game could represent the intensity of the effort, but the relationship with the HR is not a linear issue; many other variables can influence the HR during a game with a nonlinear pattern.

In official games, the measurement of player HR is a difficult task, because the athlete needs to wear an uncomfortable portable device affixed near the chest. Moreover, the rules of many sports have restrictions or do not allow the players to use these devices, and players, in general, do not like to use these devices, believing that the devices limit breathing. However, despite of these issues in HR acquisition, the player velocity and several kinematic variables can be obtained without break any sport rule by non-invasive video-based player tracking methods (BARROS et al., 2007; FIGUEROA; LEITE; BARROS, 2006a), using the obtained player trajectory to derive the velocity curve.

Inspired by biological concepts, Artificial Neural Networks are known by their power to learn complex functions and can be used for classification or regression in nonlinear data. Indeed, if exist a pattern between HR and others variables, a neural network can be modeling. Approach by neural network with HR data aimed for estimating the energy expenditure have been proposed (GARCÍA-MASSÓ et al., 2014). Although neural networks have been applied to many fields, as in computer vision and speech recognition (DENG; YU, 2014) , sports applications remain largely unexplored; few studies exist.

Thus, the purpose of this study was exploring the feasibility of using a neural network to estimate the HR with kinematics (velocities, distance covered, and, time played) and individual features (anthropometric, age, and, performance tests) as inputs. The specific objective was to evaluate the regression using the trained neural network into a different game.

(36)

3.3 Methods

The participants were 9 male elite players from the same team who played 5 games from the 2011/2012 season of the New Brazilian Basketball League, organized by the National Basketball League. The participants had a mean age of 27.89 ± 5.80 years, mean body mass of 92.31 ± 11.94 kg, mean body fat percentage of 10.63 ± 4.36% and mean height of 193.91 ± 6.88 cm. They signed an informed consent form, and the Local Institutional Review Board approved the research.

For this exploratory study, we selected a total of 12,910 HR samples achieved by 9 athletes of a team during 5 official games. Then, we trained the neural network without samples from one game, and evaluate the regression into the game excluded (Figure 11). This protocol was performed for each of five games. The chosen architecture was a multilayer perceptron feed-forward network with 30 hidden neurons in one hidden layer and the neural network training was run in the GPU with a scaled conjugate gradient backpropagation algorithm. For performing the neural network train, we divided the samples from four games into three subsets: training (70%), validation (15%), and dependent test (15%). The end of training was complete when any of the following statements was achieved: (a) 1000 iterations had executed, (b) validation mean squared error not improved after 6 iterations, or (c) mean square error reduction was less than 0.000001. Have trained, an independent testing was performed in the excluded game. Matlab 2014a (MathWorks, USA) was used to perform the neural network training and evaluation.

The neural network (Figure 11) inputs used were: the player velocities (560 inputs related to ~75 seconds of the velocity curve before the HR achieved), the player accumulated distance covered in the game until HR achieved (1 input), time played until HR achieved (1 input) and, individual features (15 inputs depicted the anthropometric, age, and, performance tests). The velocities and distance covered were computed from the player trajectory obtained by a video-based method.

(37)

Figure 11. Study design, player trajectory 75 s before an HR sample, and, neural network architecture.

3.3.1 Data Acquisition

The kinematics were obtained by a video-based method. The follow procedures detailed are the video recording, players trajectory measures in video, and, trajectory 2D reconstruction to court plane. All the steps are related to the aquisition of the distance covered and velocity. Thus, for kinematic data acquisition, initially, the games were recorded by four camcorders (JVC® GZHD10, FullHD, 30 Hz), mounted in fixed same positions in the corners of the gym, at the highest point possible from the ground (12 m to the court). Approval for video data collection was obtained from the Brazilian National Basketball League. The video recordings were transferred onto a computer, converted to AVI format (7.5 Hz frame frequence). After the video footage, the DVideo System (Campinas, SP, BRAZIL) was used for measurement of the screen coordinates, calibration, and time synchronization of the cameras, and reconstruction of two-dimensional coordinates of the players on the court (BARROS et al., 2007).

The screen coordinates of each player on the image sequences were measured manually, and their position was estimated in each frame by the operator, considering the projection of each player position in the plane of the court. For calibration, sixteen points (line

(38)

intersections on the plane court) with 2D coordinates were measured against a two-dimensional global reference system (laser measures with Leica® DISTO D5). The origin of the global system was defined at the intersection of one of the lateral lines (x-axis) with one of the bottom lines (y-axis). The cameras were temporally synchronized by a common event. The two-dimensional reconstruction of player position was based on the Direct Linear Transformation (DLT) method.

The 2D coordinates of player position vs. time were smoothed separately using a fourth-order Butterworth low-pass filter, with a cutoff frequency of 0.45 Hz, determined in another study (MONEZI et al., 2013). From the data of position vs. time, velocities were obtained by the difference between the distances that were calculated by the cumulative sum of displacements between two successive frames.

The acquisition of HR data was made using 9 Polars (Polar Team Sport System, Polar Electro, Oy, Finland), at frequency of 0.2 Hz. The HR signal was synchronized with player tracking to match, temporally, with the velocity signal, and also, with the acumulated distance covered curve.

The following additional 3 anthropometric features were used: height, body mass, and body fat percentage. The age was another feature added. The body fat percentage was estimated using a 7 skinfold-thickness protocol (JACKSON; POLLOCK, 1978). We also used the values from several performance tests as inputs: the best, worst, mean, and total times of Shuttle Run (15 m) (CASTAGNA et al., 2007); fatigue index; max HR in the Shuttle Run; highest HR in the game; estimated HR (220 – age); lactate threshold (CASTAGNA et al., 2010); HR of lactate threshold; and percentage of maximal HR at lactate threshold. In total, we analyzed 15 additional features (3 from anthropometrics, 1 from age and 11 from performance tests).

3.4 Results

The correlation values are depicted for each game excluded case (Table 3). For example, a testing on independent game 1 when the neural network was trained with games 2,3,4 and 5 is the best fitting modelling of the HR (Figure 12), with correlation, R, of 0.76

(39)

(coeficient of determination, R2, 0.58). The average correlation obtained in modelling 0.62 ± 0.09 (coeficient of determination, R2, 0.38). An example of differences between an output HR curve obtained with neural network and target HR curve from real polar data is also presented (Figure 13).

Table 3. Correlation values depicted by subsets for each game exclusion, E (Gn), on neural network

training. Train (T) Validation (V) Dependent Test (DT) All (T+V+DT) Independent Test (IT) E (G1) 0.64 0.62 0.61 0.63 0.76 (n=2663) E (G2) 0.72 0.70 0.70 0.71 0.62 (n=2549) E (G3) 0.64 0.62 0.66 0.64 0.52 (n=3414) E (G4) 0.70 0.69 0.67 0.69 0.59 (n=1613) E (G5) 0.63 0.66 0.63 0.64 0.59 (n=2671) Mean 0.67 0.66 0.65 0.66 0.62 Std 0.041 0.038 0.035 0.035 0.089

(40)

Figure 12. Regression plot results of testing in independent game 1 when the neural network was trained with games 2,3,4 and 5.

Figure 13. Differences between an output HR curve obtained with neural network and target HR curve from real polar data.

(41)

3.5 Discussion

The results of modeling with the preliminary data explored showed feature lacking for estimate the HR, perhaps, due to the high variability of the HR that may be related to several factors, even as, psychological aspects. Looking to the kinematics features, the HR signal synchronization with tracking, the sampling frequency of player velocity, and trajectory smoothing filters might be related to a part of the error propagation in HR modeling; nevertheless, a larger window size for the velocities inputs than the one used in this study may improve the modeling. Aiming to a best fitting, we believe that with more games and players the neural network can be improve its learning to generalization.

To further understand the HR dynamic in-game, we suggest incorporating other variables related to environmental changes, like temperature and humidity, and inputs for describing the events of the game: timeouts, faults, free-throws, live ball or dead ball, and technical/tactical actions. For example, a rebound, jump shot, and other vertical displacements can increase the HR, and the 2D velocity data used in this study does not indicate these types of actions. Perhaps, 3D velocity data can indicate vertical displacements. For improving the player characteristics, some other performance tests, anaerobic and aerobic, could be used as examples: maximal lactate steady state, lactate minimum, yo-yo intermittent test (CASTAGNA et al., 2008), 20 m shuttle run, and fitness test.

In this study, we explored with an approach by neural network to modeling and estimate the HR in-game. The possibility of using the neural network for a HR monitoring without a device on the athlete during the game has several reasons, such as controlling the physiological load and better switching of players with high HR.

3.6 Acknowledgments

We would like to thank the CNPq (Conselho Nacional de Desenvolvimento Científico e Tecnológico) for their support.

(42)

4. Discussão

O tratamento e análise do basquetebol em competição envolve aspectos inerentes ao jogo, como exemplo a imprevisibilidade, e a interação. Nesse sentido, as relações dos dados possuem um comportamento não linear, dessa forma, a utilização e suporte de outras áreas do conhecimento como o Aprendizado de Máquina com ferramentas apropriadas se faz necessária. Em nossas aplicações a ferramenta para tratar esse tipo de dados foi a rede neural, que pode aprender funções complexas descobrindo padrões importantes mediante a um treinamento supervisionado.

O uso do aprendizado de máquina ainda é subexplorado na análise de esportes, e parece ser promissor para contribuir na análise dos jogadores e dos jogos. Não obstante, o desenvolvimento de sistemas automáticos para obter informações e reconhecer padrões em jogos desportivos pode ser relevante para aumentar o rendimento das equipes e conhecer melhor as modalidades.

A relevância da primeira contribuição, que apresenta uma metodologia para localização 3D de jogadores de basquetebol, é contemplar a precisão com automacidade, só assim, é possível utilizar a metodologia proposta para obter a informação de posição dos jogadores que será utilizada na análise técnica, fisica e tática. Esses tipos de dados só têm viabilidade para a obtenção se não precisarem de uma demanda de tempo e material humano para serem obtidos, descartando assim a forma manual de medição. Os dados também não podem ser utilizados se não são válidos, nesse sentido, 0,25 m de erro é aproximadamente o diâmetro da cabeça de um basquetebolista, é considerado aceitável devido ao tamanho de uma quadra de basquetebol, e pode ser utilizado para o referenciamento na quadra e na análise dos jogos.

A segunda contribuição explorou a utilização da rede neural para estimar a frequência cardíaca de jogadores durente o jogo com a cinemática e atributos individuais como variáveis de entrada. Os resultados mostraram que existe ainda uma lacuna de variáveis para explicar o comportamento da frequência cardíaca, visto que até fatores psicológicos tem influência, entretanto, a modelagem com uma rede neural parece ser apropriada para buscar um melhor modelo de estimativa da frequência cardíaca.

(43)

5. Conclusão

No presente trabalho, foram apresentadas relevantes contribuições as ciências do esporte para o desenvolvimento de métodos baseados em vídeo com auxilio de ferramentas das áreas de aprendizado de máquina, visão computacional e processamento de imagens. Os resultados das propostas abordadas apontam um desenvolvimento metodológico crucial ao tratamento e análise do basquetebol, possibilitando investigações em contexto de competição.

(44)

6. Referências

ABDEL-AZIZ, Y. I.; KARARA, H. M. Direct Linear Transformation from Comparator Coordinates Into Object Space Coordinates in Close-range Photogrammetry. [s.l: s.n.].

ALAHI, A. et al. Sport players detection and tracking with a mixed network of planar and omnidirectional camerasThird ACM/IEEE International Conference on Distributed Smart Cameras, 2009. ICDSC 2009. Anais... In: THIRD ACM/IEEE INTERNATIONAL CONFERENCE ON DISTRIBUTED SMART CAMERAS, 2009. ICDSC 2009. ago. 2009 BARROS, R. M. L. et al. Analysis of the distances covered by first division Brazilian soccer players obtained with an automatic tracking method. Journal of Sports Science and Medicine, v. 6, n. 2, p. 233–242, 2007.

BARROS, R. M. L. et al. Measuring handball players trajectories using an automatically trained boosting algorithm. Computer Methods in Biomechanics and Biomedical Engineering, v. 14, n. 1, p. 53–63, fev. 2011.

CASTAGNA, C. et al. Relation between maximal aerobic power and the ability to repeat sprints in young basketball players. The Journal of Strength & Conditioning Research, v. 21, n. 4, p. 1172–1176, 2007.

CASTAGNA, C. et al. The Yo-Yo intermittent recovery test in basketball players. Journal of Science and Medicine in Sport / Sports Medicine Australia, v. 11, n. 2, p. 202–208, abr. 2008. CASTAGNA, C. et al. Validity of an on-court lactate threshold test in young basketball players. Journal of Strength and Conditioning Research / National Strength & Conditioning Association, v. 24, n. 9, p. 2434–2439, set. 2010.

CHEN, H.-T. et al. Recognizing tactic patterns in broadcast basketball video using player trajectory. Journal of Visual Communication and Image Representation, v. 23, n. 6, p. 932–947, ago. 2012.

DALAL, N.; TRIGGS, B. Histograms of oriented gradients for human detectionIEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. CVPR

(45)

2005. Anais... In: IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, 2005. CVPR 2005. jun. 2005

DELANNAY, D.; DANHIER, N.; DE VLEESCHOUWER, C. Detection and recognition of sports(wo)men from multiple views. In: DISTRIBUTED SMART CAMERAS, 2009. ICDSC 2009. THIRD ACM/IEEE INTERNATIONAL CONFERENCE ON. 30 ago. 2009

DENG, L.; YU, D. Deep Learning: Methods and Applications. [s.l.] Now Publishers Inc, 2014. FIGUEROA, P. J.; LEITE, N. J.; BARROS, R. M. L. Tracking soccer players aiming their kinematical motion analysis. Computer Vision and Image Understanding, v. 101, n. 2, p. 122– 135, fev. 2006a.

FIGUEROA, P. J.; LEITE, N. J.; BARROS, R. M. L. Background recovering in outdoor image sequences: An example of soccer players segmentation. Image and Vision Computing, v. 24, n. 4, p. 363–374, 1 abr. 2006b.

GARCÍA-MASSÓ, X. et al. Neural Network for Estimating Energy Expenditure in Paraplegics from Heart Rate. International Journal of Sports Medicine, v. 35, n. 12, p. 1037–1043, 2 jun. 2014.

GOMEZ, G. et al. Tracking of Ball and Players in Beach Volleyball Videos. PLoS ONE, v. 9, n. 11, p. e111730, 26 nov. 2014.

GRAY, A. J. et al. Validity and reliability of GPS for measuring distance travelled in field-based team sports. Journal of Sports Sciences, v. 28, n. 12, p. 1319–1325, 1 out. 2010.

HOPKINS, W. G.; HAWLEY, J. A.; BURKE, L. M. Design and analysis of research on sport performance enhancement: Medicine & Science in Sports & Exercise, v. 31, n. 3, p. 472–485, mar. 1999.

INTILLE, S. S.; BOBICK, A. F. Visual Tracking Using Closed-Worlds. [s.l.] In Proceedings of International Conference on Computer Vision, 1995.

IVANKOVIC, Z. et al. AdaBoost in basketball player identification2012 IEEE 13th International Symposium on Computational Intelligence and Informatics (CINTI). Anais... In:

2012 IEEE 13TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL