Frameworks for trading stocks in the financial market using data mining techniques and multi-swarm optimization algorithms for dynamic and multimodal enviroments

(1)

Frameworks for trading stocks in the financial market using data mining techniques and multi-swarm optimization algorithms for dynamic and multimodal

environments

Universidade Federal de Pernambuco posgraduacao@cin.ufpe.br http://cin.ufpe.br/~posgraduacao

Recife 2019

(2)

Frameworks for trading stocks in the financial market using data mining techniques and multi-swarm optimization algorithms for dynamic and multimodal

environments

Tese apresentada ao Programa de Pós-Graduação em Ciências da Computação da Universidade Federal de Pernambuco, como requisito parcial para a obtenção do título de Doutor em Ciências da Computação.

Área de Concentração: Inteligência

Computacional, Otimização Multi-enxames e Problemas de Otimização Dinâmica.

Orientador: Adriano Lorena Inácio de Oliveira

Recife 2019

(3)

UFPE-MEI 2019-144 CDD (23. ed.)

006.3

B823f Brasileiro, Rodrigo de Carvalho.

Frameworks for trading stocks in the financial market using data mining techniques and multi-swarm optimization algorithms for dynamic and multimodal enviroments / Rodrigo de Carvalho Brasileiro. – 2019.

141 f.: il., fig., tab.

Orientador: Adriano Lorena Inácio de Oliveira..

Tese (Doutorado) – Universidade Federal de Pernambuco. CIn, Ciência da Computação, Recife, 2019.

Inclui referências e apêndice.

1. Inteligência computacional. 2. Mineração de dados. 3. Fluxo contínuo de dados. I. Oliveira, Adriano Lorena Inácio de (orientador). II. Título.

(4)

RODRIGO DE CARVALHO BRASILEIRO

“Frameworks for trading stocks in the financial market using

data miningtechniques and multi-swarm optimization

algorithms for dynamic and multimodalenvironments”

Tese de Doutorado apresentada ao

Programa de Pós-Graduação em

Ciência da Computação da Universidade Federal de Pernambuco, como requisito parcial para a obtenção do título de Doutor em Ciência da Computação.

Aprovado em: 23/05/2019.

__________________________________________

Orientador: Prof. Dr. Adriano Lorena Inácio de Oliveira

BANCA EXAMINADORA

____________________________________________________ Prof. Aluizio Fausto Ribeiro Araújo (Examinador Interno)

Centro de Informática / UFPE

____________________________________________________ Prof. Dr. Cleber Zanchettin (Examinador Interno)

Centro de Informática / UFPE

____________________________________________________ Prof. Dr. Ricardo de Andrade Araújo (Examinador Externo) Instituto Federal do Sertão Pernambucano / IFSERTÃO-PE ____________________________________________________

Prof. Dr. Bruno José Torres Fernandes (Examinador Externo) Escola Politécnica de Pernambuco / UPE

____________________________________________________ Prof. Dra. Nadia Nedjah (Examinador Externo)

(5)

(6)

Em primeiro lugar agradeço a Deus pela saúde plena concedida e motivação ao longo de quase uma década de trabalho no CIn (mestrado e doutorado). Agradeço às diversas pessoas que apoiaram direta ou indiretamente o desenvolvimento deste trabalho, seja no ambito pessoal, acadêmico e profissional.

Por ser a pessoa que mais acreditou, incentivou e apoiou em todos os momentos, um agradecimento mais que especial à minha esposa, Roberta Souza.

Ao meu orientador, Adriano Oliveira, por ter contribuído diretamente com minha pesquisa e, principalmente com minha formação desde o mestrado até o doutorado. Agradeço pela oportunidade concedida e o aprendizado que obtive durante todo este tempo de trabalho.

Agradeço especialmente aos meus pais Ricardo e Suely por todo o tempo dedicado para minha criação e ao meu irmão Ricardo, que sabe das dificuldades que enfrentamos desde cedo.

Aos meus familiares que acreditaram neste trabalho, inclusive aos que não conseguiram ver este trabalho e já se foram a muitos anos, em especial a minha querida e saudosa avó Yolanda Régis.

Ao meu cunhado Victor Souza que me ajudou muito neste trabalho.

Também agradeço a todos os meus amigos e colegas de trabalhos que me cobriram durante minha ausência, além de todo o tempo cedido para que eu pudesse terminar este trabalho.

Aos Professores Aluizio Fausto Ribeiro Araújo, Cleber Zanchettin, Ricardo de An-drade Araújo, Bruno José Torres Fernandes e Nadia Nedjah que aceitaram fazer parte da banca examinadora. Também agradeço ao Professor Carmelo Bastos, que mesmo não participando desta banca sempre esteve disponível.

Por fim, aos Professores e também colegas de trabalho da Faculdades Integradas Barros Melo (AESO), além de todo o quadro de funcionários.

(7)

about how hard you can get hit and keep moving forward. How much you can take and keep moving forward. That’s how winning is done!" (STALLONE, 2006)

(8)

Financial time series behave similarly to a data stream, that is, a set of input elements that arrive continuously and sequentially over time. So, a time series may present concept drifts, which is the change in the data generating process. This phenomenon negatively affects forecasting methods that rely on observing the past behavior of the series to predict future values. Many papers report the use of data mining techniques and computational intelligence to predict the future direction of stock prices, uncovering patterns in time series data to support decision making for financial market operations. The traditional optimization algorithms proposed in the literature generally assume that the environment is static, assuming that the time series data generation distribution is the same over the period of interest. Another problem is that, sometimes these methods do not take into account the possibility that the function to be optimized has multiple peaks and, in this case, is represented by multimodal functions. However, multimodality is one of the known features of real-time financial time series optimization problems. Furthermore, several methods involving optimization algorithms have been proposed in the literature, however most of them do not consider real world problems. The main contribution of this work is a decision support system capable of dealing with concept changes and multimodality in the financial time series environment. To achieve this goal, we propose two modelos that aim to find patterns in financial time series, using multi-swarms to improve particle initialization, thus avoiding local optimum in the final optimization phase. In addition, the models use a validation step with the early stopping criteria to avoid overfitting. In contrast to the first proposed model, the second one considers two consecutive generations of populations to detect changes in time series, then a statistical test is used to check for changes in the environment to avoid false positives. Once a change is detected, the second model performs a series of actions to find new patterns, replacing obsolete ones. The patterns discovered by the models are used in conjunction with proposed investment rules to support decisions and help investors maximize profit on their stock market operations. Experiments using 82 stocks from the S&P100 index were tested with a confidence level of 95%, showing that the proposed method is able to improve results when concept drifts are considered.

Keywords: Multi-Swarm optimization. Pattern discovery. Data mining. Time series

(9)

As séries temporais financeiras comportam-se de forma semelhante a um fluxo de dados, ou seja, um conjunto de elementos de entrada que chegam de forma contínua e se-quencial ao longo do tempo. Então, uma série temporal pode apresentar uma mudança de conceito, que é a mudança no processo gerador dos dados. Este fenômeno afeta negativa-mente os métodos de previsão que se baseiam na observação do comportamento passado da série para prever valores futuros. Muitos trabalhos relatam o uso de técnicas de mineração de dados e inteligência computacional para prever a direção futura dos preços das ações, descobrindo padrões nos dados das séries temporais para fornecer suporte a decisões para as operações realizadas no mercado financeiro. Os algoritmos de otimização tradicionais propostos na literatura geralmente consideram que o ambiente é estático, supondo que a distribuição geradora dos dados das séries temporais seja a mesma ao longo do período de interesse. Outro problema é que algumas vezes estes métodos não levam em consideração a possibilidade da função a ser otimizada ter múltiplos picos e, neste caso, ser representada através de funções multimodais. No entanto, a multimodalidade é uma das características conhecidas dos problemas de otimização em séries temporais financeiras no mundo real. Além disso, diversos métodos envolvendo algoritmos de otimização foram propostos na literatura, no entanto a maior parte deles não levam em consideração os problemas do mundo real. A principal contribuição deste trabalho é um sistema de suporte a decisão capaz de tratar as mudanças de conceito no ambiente e multimodalidade das séries tem-porais financeiras. Para atingir foi proposto dois modelos que visam encontrar padrões em séries temporais financeiras, usando multi-enxames para melhorar a inicialização das partículas, portanto evitando ótimos locais na fase final da otimização. Além disso, os modelos usam um conjunto de validação com o critério de parada antecipada para evi-tar overfitting. Em contraste com o primeiro modelo proposto, o segundo considera duas gerações consecutivas das populações para detectar as mudanças nas séries temporais, posteriormente um teste estatístico é usado para verificar se houve mudanças no ambi-ente, procurando evitar falsos positivos. Após detectar uma mudança, o segundo modelo executa uma série de medidas com o objetivo de encontrar novos padrões, substituindo os obsoletos. Os padrões descobertos pelos modelos são usados em conjunto com as regras de investimento propostas para apoiar as decisões e ajudar os investidores a maximizar o lucro em suas operações no mercado de ações. Experimentos usando 82 ações do índice S&P100 foram testados com nível de confiança de 95%, mostrando que o método proposto é capaz de melhorar os resultados quando as mudanças de conceito são consideradas.

Palavras-chaves: Otimização Multi-enxames. Descoberta de padrões. Mineração de

da-dos. Representação de séries temporais. Otimização de enxame de particulas. Problemas de otimização dinâmica.

(10)

Fig. 1 – Time series related to the stocks Itaúsa Investments (ITSA4), Coca-Cola

Company (KO), Light S.A. (LIGT3) and M.D.C. Holdings (MDC). . . 23

Fig. 2 – General overview of the thesis, where boxes represent the chapters and other contributions of the thesis. The box in blue presents the contribu-tions on the topic of optimization based on multi-swarm and optimization for dynamic environments. The box in green shows the contributions as co-author on the topics of optimization. . . 29

Fig. 3 – PAA representation of a time series Q. In this example, PAA parameters are n = 15, k = 5. . . 37

Fig. 4 – Statistical standardization of the time series data. . . 37

Fig. 5 – PAA segments when the relationship 𝑛/𝑘 does not result in integer. In this example, PAA parameters are 𝑛 = 12, 𝑘 = 5, which means that each segment must have 2.4 points of the contribution to the average. . . 38

Fig. 6 – Computing MINDIST between two PAA time series representations, 𝑄 and 𝐶. . . 38

Fig. 7 – The time series is discretized by obtaining its PAA representation, then using the predetermined breakpoints to map the PAA segments to SAX symbols. In this example, with 𝑛 = 80, 𝑘 = 5 e 𝛼 = 3, the time series is mapped to the “word” baabc. . . 40

Fig. 8 – Representation of a unimodal function with a global optimum. . . 41

Fig. 9 – Representation of a multimodal function with a global optimum and sev-eral local optima. . . 42

Fig. 10 – Representation of multimodal function. . . 57

Fig. 11 – Representation of the particle structure. . . 69

Fig. 12 – Decision rules. . . 69

Fig. 13 – Architecture of the proposed approach. . . 70

Fig. 14 – History of the S&P100 index between 1979 and 2019. . . 75

Fig. 15 – Boxplot of the results obtained by the models in their 50 runs without drawdown, using 50 particles, for the ALL stock. . . 77

Fig. 16 – Comparison of the monthly return on investment between Buy-and-hold and PAA-MS-IDPSO-V ALL stock without drawdown, using 50 particles. 78 Fig. 17 – Histogram of the profits/losses of the GS stock over the 50 runs of the model PAA-MS-IDPSO-V without drawdown with 50 particles. . . 78

Fig. 18 – Convergence analysis of PAA-IDPSO and PAA-MS-IDPSO models for CELG and CVS stocks, considering the average of 50 executions. . . 79

(11)

PAA-IDPSO-V, PAA-IDPSO (all using 25 particles without risk control), Buy-and- Hold and Teixeira and Oliveira (2010) for all analyzed stocks. . 80 Fig. 20 – Nemenyi post-hoc test for the models PAA-MS-IDPSO-V, PAA-MS-IDPSO,

PAA-IDPSO-V, PAA-IDPSO (all using 50 particles without risk control), Buy-and- Hold and Teixeira and Oliveira (2010) for all analyzed stocks. . 82 Fig. 21 – Nemenyi post-hoc test for the models PAA-MS-IDPSO-V, PAA-MS-IDPSO,

PAA-IDPSO-V, PAA-IDPSO (all using 100 particles without risk con-trol), Buy-and- Hold and Teixeira and Oliveira (2010) for all analyzed stocks. . . 82 Fig. 22 – Nemenyi post-hoc test for the models PAA-MS-IDPSO-V, PAA-MS-IDPSO,

PAA-IDPSO-V, PAA-IDPSO (all using 25 particles with risk control), Buy-and- Hold and Teixeira and Oliveira (2010) for all analyzed stocks. . 82 Fig. 23 – Nemenyi post-hoc test for the models PAA-MS-IDPSO-V, PAA-MS-IDPSO,

PAA-IDPSO-V, PAA-IDPSO (all using 50 particles with risk control), Buy-and- Hold and Teixeira and Oliveira (2010) for all analyzed stocks. . 83 Fig. 24 – Nemenyi post-hoc test for the models PAA-MS-IDPSO-V, PAA-MS-IDPSO,

PAA-IDPSO-V, PAA-IDPSO (all using 100 particles with risk control), Buy-and- Hold and Teixeira and Oliveira (2010) for all analyzed stocks. . 83 Fig. 25 – Nemenyi post-hoc test for the models PAA-MS-IDPSO-V using 25, 50

and 100 particles for all analyzed stocks. . . 83 Fig. 26 – Nemenyi post-hoc test for the PAA-MS-IDPSO-V with and without risk

control, using 50 particles. . . 84 Fig. 27 – Architecture of the proposed approach. . . 90 Fig. 28 – PAA-MS-HmSO-V vs PAA-MS-IDPSO-V vs Buy-and-hold for the AMGN

stock. . . 95 Fig. 29 – ROI of the DD stock (considering the 50 simulations) for both the

PAA-MS-HmSO-V and the PAA-MS-IDPSO-V approaches. . . 96 Fig. 30 – ROI of the BMY stock (considering the 50 simulations) for the model

PAA-MS-HmSO-V approach. . . 96 Fig. 31 – Convergence analysis of PAA-HmSO and PAA-MS-HmSO models for

AGN and AIG stocks, considering the average of 50 executions. . . 97 Fig. 32 – Convergence analysis of PAA-HmSO and PAA-MS-HmSO models for

BAC and BIIB stocks, considering the average of 50 executions. . . 97 Fig. 33 – Nemenyi post-hoc test for the models PAA-HmSO, PAA-IDPSO and

Buy-and-hold for all analyzed stocks. . . 99 Fig. 34 – Nemenyi post-hoc test for the models PAA-HmSO-V, PAA-IDPSO-V and

(12)

and Buy-and-hold for all analyzed stocks. . . 99 Fig. 36 – Nemenyi post-hoc test for the models PAA-MS-HmSO-V,

PAA-MS-IDPSO-V and Buy-and-hold for all analyzed stocks. . . 100 Fig. 37 – Nemenyi post-hoc test for the models PAA-MS-HmSO-V,

PAA-MS-IDPSO-V, PAA-MS-HmSO, PAA-MS-IDPSO, PAA-HmSO-PAA-MS-IDPSO-V, PAA-IDPSO-PAA-MS-IDPSO-V, PAA-HmSO, PAA-IDPSO and Buy-and-Hold for all analyzed stocks. . . . 100 Fig. 38 – Nemenyi post-hoc test for the models PAA-MS-HmSO-V,

PAA-MS-IDPSO-V, Random Forest, SVM, k-NN and Buy-and-Hold for all analyzed stocks. 100 Fig. 39 – Boxplot of the results obtained by the models (only Multi-Swarms online

vs offline) in their 50 runs, using 25 particles, for the MSFT stock. . . 101 Fig. 40 – Boxplot of the results obtained by the models in their 50 runs, using 25

particles, for the ALL stock. . . 101 Fig. 41 – Convergence analysis of the PAA-IDPSO, PAA-MS-IDPSO, PAA-HmSO

and PAA-MS-HmSO models for the ABT stock, considering the average of 50 executions. . . 129 Fig. 42 – Convergence analysis of the PAA-IDPSO, PAA-MS-IDPSO, PAA-HmSO

and PAA-MS-HmSO models for the ACN stock, considering the average of 50 executions. . . 130 Fig. 43 – Convergence analysis of the PAA-IDPSO, PAA-MS-IDPSO, PAA-HmSO

and PAA-MS-HmSO models for the ALL stock, considering the average of 50 executions. . . 130 Fig. 44 – Convergence analysis of the PAA-IDPSO, PAA-MS-IDPSO, PAA-HmSO

and PAA-MS-HmSO models for the BA stock, considering the average of 50 executions. . . 131 Fig. 45 – Convergence analysis of the PAA-IDPSO, PAA-MS-IDPSO, PAA-HmSO

and PAA-MS-HmSO models for the BAC stock, considering the average of 50 executions. . . 131 Fig. 46 – Convergence analysis of the PAA-IDPSO, PAA-MS-IDPSO, PAA-HmSO

and PAA-MS-HmSO models for the BIIB stock, considering the average of 50 executions. . . 132 Fig. 47 – Convergence analysis of the PAA-IDPSO, PAA-MS-IDPSO, PAA-HmSO

and PAA-MS-HmSO models for the BK stock, considering the average of 50 executions. . . 132 Fig. 48 – Convergence analysis of the PAA-IDPSO, PAA-MS-IDPSO, PAA-HmSO

and PAA-MS-HmSO models for the BLK stock, considering the average of 50 executions. . . 133

(13)

and PAA-MS-HmSO models for the CAT stock, considering the average of 50 executions. . . 133 Fig. 50 – Convergence analysis of the PAA-IDPSO, PAA-MS-IDPSO, PAA-HmSO

and PAA-MS-HmSO models for the CELG stock, considering the average of 50 executions. . . 134 Fig. 51 – Convergence analysis of the PAA-IDPSO, PAA-MS-IDPSO, PAA-HmSO

and PAA-MS-HmSO models for the CL stock, considering the average of 50 executions. . . 134 Fig. 52 – Convergence analysis of the PAA-IDPSO, PAA-MS-IDPSO, PAA-HmSO

and PAA-MS-HmSO models for the CMCSA stock, considering the aver-age of 50 executions. . . 135 Fig. 53 – Convergence analysis of the PAA-IDPSO, PAA-MS-IDPSO, PAA-HmSO

and PAA-MS-HmSO models for the COF stock, considering the average of 50 executions. . . 135 Fig. 54 – Convergence analysis of the PAA-IDPSO, PAA-MS-IDPSO, PAA-HmSO

and PAA-MS-HmSO models for the CVS stock, considering the average of 50 executions. . . 136 Fig. 55 – Convergence analysis of the PAA-IDPSO, PAA-MS-IDPSO, PAA-HmSO

and PAA-MS-HmSO models for the CVX stock, considering the average of 50 executions. . . 136 Fig. 56 – Convergence analysis of the PAA-IDPSO, PAA-MS-IDPSO, PAA-HmSO

and PAA-MS-HmSO models for the DD stock, considering the average of 50 executions. . . 137 Fig. 57 – Convergence analysis of the PAA-IDPSO, PAA-MS-IDPSO, PAA-HmSO

and PAA-MS-HmSO models for the DHR stock, considering the average of 50 executions. . . 137 Fig. 58 – Convergence analysis of the PAA-IDPSO, PAA-MS-IDPSO, PAA-HmSO

and PAA-MS-HmSO models for the DIS stock, considering the average of 50 executions. . . 138 Fig. 59 – Convergence analysis of the PAA-IDPSO, PAA-MS-IDPSO, PAA-HmSO

and PAA-MS-HmSO models for the EXC stock, considering the average of 50 executions. . . 138 Fig. 60 – Convergence analysis of the PAA-IDPSO, PAA-MS-IDPSO, PAA-HmSO

and PAA-MS-HmSO models for the F stock, considering the average of 50 executions. . . 139 Fig. 61 – Convergence analysis of the PAA-IDPSO, PAA-MS-IDPSO, PAA-HmSO

and PAA-MS-HmSO models for the FDX stock, considering the average of 50 executions. . . 139

(14)

and PAA-MS-HmSO models for the FOX stock, considering the average of 50 executions. . . 140 Fig. 63 – Convergence analysis of the PAA-IDPSO, PAA-MS-IDPSO, PAA-HmSO

and PAA-MS-HmSO models for the FOXA stock, considering the average of 50 executions. . . 140

(15)

Table 1 – 𝐵𝑟𝑒𝑎𝑘𝑝𝑜𝑖𝑛𝑡𝑠 dividing a gaussian distribution into equiprobable regions, considering the number of regions from 3 to 10. . . 39 Table 2 – List of stocks used in the experiments. . . 73 Table 3 – List of stocks used in the experiments. . . 74 Table 4 – Mean and standard deviation of computational costs in seconds for

PAA-IDPSO, PAA-IDPSO-V, PAA-MS-IDPSO, and PAA-MS-IDPSO-V models. . . 81 Table 5 – Mean and standard deviation of computational costs in seconds for the

PAA-HmSO, PAA-HmSO-V, PAA-MS-HmSO, and PAA-MS-HmSO-V models. . . 98 Table 6 – Average result of the financial return over the testing period. Comparison

between Buy-and-hold, static and dynamic models. . . 123 Table 7 – Average result of the financial return over the testing period. Comparison

between Buy-and-hold, static and dynamic models. . . 126 Table 10 – Average costs and standard deviation (in seconds), considering the 50

(16)

ABC Artificial Bee Colony

ACO Ant Colony Optimization

AFSO Artificial Fish Swarm Optimization

ARV Average RVvariance

BFO Bacterial Foraging Optimization BSR Blockwise Strong Relationship

CDDE-Ar Cluster-based Dynamic Differential Evolution with external Ar chive

CI Computational Intelligence

COEA competitive-cooperation coevolutionary algorithm

DA Directional accuracy

DOFPSO Particle Swarm Optimization in dynamic objective function

environment

DPP-PSO Dynamic particle population based particle swarm optimization DSSs Decision Support Systems

DTSP Dynamic Travelling Salesman Problem

DTW Dynamic time warping

EA Evolutionary Algorithms

EMA Exponential Moving Average

eTrend Hybrid Evolutionary Trend Following Algorithm

eVL Virtual Loser

eVLGA Extended Virtual Loser Genetic Algorithm

FA Firefly Algorithm

FFT Fast Fourier Transform

FGA Forking Genetic Algorithm

FOREX Foreign Exchange

FT Fourier transform

GA Genetic Algorithm

(17)

Environments

IDPSO Improved Self-adaptive Particle Swarm Optimization algorithm

k-NN k–Nearest Neighbors

LDEs Less Detectable Environments

LPBAN Legendre Polynomial Based Adaptive Nonlinear MACD Moving Average Convergence Divergence

MAPE Mean Absolute Percentage Error

MDL Minimum Description Length

MINDIST Minimum Distance

MOPSO Multi-Objective Particle Swarm Optimization mPSO Multiswarm Particle Swarm Optimization NSGA-II Non dominated sorting genetic algorithm

OBV On Balance Volume

P-ACO Population-based ACO

PAA Piecewise Aggregate Approximation PIP Perceptually Important Points PSO Particle Swarm Optimization

ROC Rate of Change

ROI Return On Investiment

RSI Relative Strength Index

SAX Symbolic Aggregate Approximation

SI Swarm Intelligence

SIDO Swarm Intelligence Dynamic Optimization SVD Singular Value Decomposition

SVM Support Vector Machine

SVR Support Vector Regression

TF Trend Following

TI Technical Indicators

TSs Trading or Investment Strategies

(18)

(19)

𝜇 Adjustment factor

𝜋 Amount of particles of the child swarm

𝜇𝑥 Average

⃗

𝑝𝑔 Best position found by whole swarm (global best)

𝑐𝑏𝑒𝑠𝑡 Best position found by child swarm 𝑐𝑏𝑒𝑠𝑡𝑝𝑎𝑟𝑒𝑛𝑡 Best position found by parent swarm

⃗

𝑝𝑖 Best position the particle (local best)

𝛽𝑗 Breakpoints dividing a gaussian distribution

𝑐1 Cognitive parameter

𝜙(𝑡) Detection function 𝑤𝑓 𝑖𝑛𝑎𝑙 Final inertial weight

𝑓 Fitness function

𝑤 Inertial weight

𝑤𝑖𝑛𝑖𝑡𝑖𝑎𝑙 Initial inertial weight

𝑟𝑒𝑥𝑐𝑙 Minimum allowed distance between two child swarm

𝑟𝑎𝑑𝑖𝑢𝑠𝑐 Maximum Euclidean distance between each two particles

𝐾𝑚𝑎𝑥 Maximum number of iterations

𝛼 Number of elements of the Alphabet

𝑘 Number of segments

𝑥′_𝑖 Point in the standardized series window ⃗

𝑥𝑖 Position dimensional vector

𝑟𝑠 Radius of random seach

𝑟 Radius of the hypersphere 𝑐2 Social parameter

(20)

𝜎𝑥 Standard deviation

𝜉 Threshold

𝑚 Time series size

(21)

1 INTRODUCTION . . . 22

1.1 MOTIVATION . . . 26

1.2 DEFINITION OF THE PROBLEM . . . 27

1.3 OBJECTIVES . . . 28

1.4 CONTRIBUTIONS . . . 28

1.5 ORGANIZATION . . . 29

2 LITERATURE REVIEW . . . 31

2.1 DECISION SUPPORT SYSTEMS FOR STOCK TRADING AND DATA STREAMS . . . 31

2.2 TIME SERIES DATA MINING . . . 34

2.2.1 State of art: Dimensionality reduction. . . 35

2.2.2 Piecewise Aggregate Approximation (PAA) . . . 36

2.2.3 Symbolic Aggregate approXimation (SAX) . . . 39

2.3 MULTIMODAL FUNCTIONS . . . 41

2.4 OPTIMIZATION ALGORITHMS . . . 42

2.4.1 Optimization algorithms based on swarm intelligence . . . 42

2.4.2 Particle Swarm Optimization (PSO) . . . 43

2.4.3 Improved Self-Adaptive Particle Swarm Optimization (IDPSO) . . . 44

2.5 DYNAMIC OPTIMIZATION . . . 47

2.5.1 State of the art: Dynamic optimization . . . 48

2.5.2 Change Detection Methods . . . 50

2.5.2.1 Hybrid method for change detection . . . 51

2.5.3 Approaches used after the change detection . . . 52

2.5.3.1 Introducing diversity when changes occur . . . 52

2.5.3.2 Maintaining diversity during the search. . . 53

2.5.3.3 Memory approaches. . . 54

2.5.3.4 Prediction approaches . . . 55

2.5.3.5 Multi-population approaches . . . 56

2.5.4 Summary of strengths and weaknesses of approaches used to deal with Dynamic optimization problems (DOPs) . . . 60

2.5.5 Hibernating multi-Swarm Optimization (HmSO) . . . 60

2.6 VALIDATION AND EARLY STOPPING CRITERIA . . . 64

(22)

OF IMPROVED SELF-ADAPTIVE PARTICLE SWARM OPTIMIZA-TION WITH VALIDAOPTIMIZA-TION (PAA-MS-IDPSO-V) . . . 67

3.1 THE PROPOSED METHOD. . . 67

3.1.1 Multi-swarm architecture . . . 70 3.1.2 Reducing the variance . . . 71

3.2 EXPERIMENTAL EVALUATION . . . 72

3.2.1 Datasets . . . 72 3.2.2 Configuration parameters . . . 75 3.2.3 Results . . . 77

3.3 SUMMARY . . . 84

4 PIECEWISE AGGREGATE APPROXIMATION - MULTI-SWARM

ARCHITECTURE OF HIBERNATING MULTI-SWARM OPTIMIZA-TION ALGORITHM FOR DYNAMIC ENVIRONMENTS WITH VALIDATION (PAA-MS-HMSO-V) . . . 87

4.1 THE PROPOSED METHOD. . . 87

4.1.1 Multi-Swarm architecture . . . 89 4.1.2 Detecting changes in the environment and reducing variance . . . . 91

4.2 EXPERIMENTAL EVALUATION . . . 92

4.2.1 Datasets . . . 92 4.2.2 Configuration parameters . . . 92 4.2.3 Results . . . 94

4.3 SUMMARY . . . 102

5 CONCLUSIONS AND FUTURE WORK . . . 104

5.1 FUTURE WORK . . . 105

REFERENCES . . . 107 APPENDIX A – TABLES CONTAINING THE AVERAGE RESULTS

OF FINANCIAL RETURNS AT THE END OF THE EXPERIMENTS . . . 122 APPENDIX B – CONSOLIDATED COMPUTATIONAL COSTS OF

THE PROPOSED MODELS . . . 127 APPENDIX C – CONVERGENCE ANALYSIS OF THE PROPOSED

(23)

1 INTRODUCTION

A time series is a set of observations that arrive sequentially in time (BOX; JENKINS; REINSEL, 2011;KANNAN et al., 2010;METCALFE; COWPERTWAIT, 2009). Several dynamic processes can be modeled through time series, such as, stock price movement (KIM, 2003; TAY; CAO, 2002; BARIGOZZI; BROWNLEES, 2019;BAO; YUE; RAO, 2017), company payroll (PHILLIPS; SLIJK, 2017), product sales (POLE; WEST; HARRISON, 2018; KECHYN et al., 2018), daily temperature (YOO et al., 2018), electricity consumption (DEB et al., 2017; GONZÁLEZ; ROQUE; PEREZ, 2017), exchange rates (SHEN; CHAO; ZHAO, 2015), operations on mobile applications (LAPTEV et al., 2017), and more. Time series forecasting can be considered as a challenging task and can be tackled using traditional statistical models or Computational Intelligence (CI) (WANG et al., 2011) methods. While statistical models assume that time series are generated from a parametric process (KUMAR; MURUGAN, 2013), approaches involving Computational Intelligence are self-adaptive methods capable of capturing the linear and nonlinear behavior of time series without the need for a priori specific statistical assumptions about data (LU; LEE; CHIU, 2009). Fig. 1 depicts the financial time series of the stocks Itaúsa Investments (ITSA4), Coca-Cola Company (KO), Light S.A. (LIGT3) and M.D.C. Holdings (MDC) considering the period between 2000 and 2010. Fluctuations in the stock prices can be seen, which could be caused by multiple causes, such as:

• Unexpected result of a company’s balance sheet;

• An increase or decrease in the value of the dollar or in the commodity prices that can influence financial results;

• Other information that directly impacts a specific company.

Although there is a vast literature on time series forecasting, most of the proposed approaches do not consider that the time series behaves like a data stream (CAVALCANTE et al., 2016). A data stream is a set of sequentially arriving data (one by one), presenting dynamism as a characteristic inherent in the data. This dynamism can lead to changes in the data stream patterns over time. It is a major challenge to traditional learning algorithms that, in general, do not evolve even in the presence of changes in the underlying data generation process. This phenomenon is referred to as concept drift (SCHLIMMER; GRANGER, 1986), concept shift (LUGHOFER; ANGELOV, 2011) or dataset shift (RAZA; PRASAD; LI, 2015). Another characteristic of the time series is the large amount of data available. This high dimensionality of data can make it difficult to detect concept drifts due to increased computational cost and storage space.

(24)

Fig. 1 – Time series related to the stocks Itaúsa Investments (ITSA4), Coca-Cola Com-pany (KO), Light S.A. (LIGT3) and M.D.C. Holdings (MDC).

In this context, data mining techniques can be used. These approaches can be defined as processes of finding unknown patterns in data, which once found can be used to build predictive models. Data mining has been used extensively by financial institutions for credit analysis and fraud detection, marketing and manufacturers for quality control and maintenance scheduling. among other applications (KOH; TAN et al., 2011; HAND, 2006; WITTEN et al., 2016; ROIGER, 2017;CHEN et al., 2015; DUA; DU, 2016;SIN; MUTHU, 2015; AGRAWAL; AGRAWAL, 2015; FENG; BARBOSA; TORRES, 2016).

Also, data mining techniques can be used to reduce the dimensionality of problems involving large amount of data, such as time series (FU, 2011). In this scenario, represen-tation methods are among the most common approaches and allow i) to reduce the data dimensionality and ii) to compute both the correlation and the similarity between time series from the transformed space with less dimensions (FUAD; MARTEAU, 2013).

In data mining applied to time series, Symbolic Aggregate Approximation (SAX) (LIN et al., 2003) and Piecewise Aggregate Approximation (PAA) (KEOGH et al., 2001) are representation techniques capable of performing dimensionality reduction. SAX is a PAA based algorithm, but SAX has discrete representative values whereas PAA uses continuous ones (LIN et al., 2007). Both techniques use the MINDIST (LIN et al., 2007) metric, which represents a Euclidean distance used to calculate proximity between their representations. This metric represents the lower boundary of the Euclidean distance and,

(25)

therefore, data mining algorithms can be used on the symbolic representation and produce results similar to the algorithms that operate on the original data (LIN et al., 2003). Thus, using techniques like SAX and PAA can speed up the process of time series data extraction while maintaining the quality of mining results. As a result of their characteristics, PAA and SAX have been used in various application domains, including the financial area.

In CI there are several research areas, among them, optimization methods such as genetic algorithms comprise one the most popular. The optimization algorithms solve problems that are composed by an objective function and a set of constraints, both related to the decision variables. Basically the task is to find the optimum value of the function (through maximization or a minimization, depending on the function) and the quality of the possible solutions is evaluated through the fitness function. Due to the high complexity in many problems, the best solutions can be difficult to find (GENTLE; HÄRDLE; MORI, 2012). In general, these problems have multimodality as characteristics and so there may be one or more optimal solutions.

Multimodal optimization have been used in recent years to deal with real world prob-lems. The goal is to find the best solution (also known as the global optimum) among the infinite solutions present in the multimodal function search space. However, some of them may lead the search algorithm to fall into so-called local optima (KRISHNANAND; GHOSE, 2009). Although traditional optimization methods are quick to converge into certain re-gions of the search space, it is difficult to find the global optimum. Convergence can be understood as finding the best possible solution, i.e., a global or a local optimum. Several variations of algorithms have been proposed to deal with multimodal optimization (XU; SI; WANG, 2009).

Traditional optimization algorithms are static and do not consider that the environ-ment may change over time (MAVROVOUNIOTIS; LI; YANG, 2017). Thus, using these meth-ods for time series may lead to degraded performance when changes in the environment occur. Dynamic Optimization Problem (DOP) is a relatively recent research area that has received attention in the literature. This approach can be defined as a sequence of instances of static problem that must be optimized and, at the end of each instance of a given problem, there is a change in the environment (MAVROVOUNIOTIS; LI; YANG, 2017). Two aspects characterize the dynamics of changes in the environment: frequency and magnitude. Frequency is related to the speed at which the environment changes and magnitude is related to the degree at which the change occurs (MAVROVOUNIOTIS; YANG, 2013).

As seen, time series can behave like a data stream, thus dynamics is an inherent characteristic that can result in changes in the data distribution and in the discovered patterns. It represents a major challenge for traditional static learning algorithms, since the learned model may become obsolete over time (GAMA, 2012). Hence, optimization algorithms for dynamic environments can be applied successfully to solve this problem,

(26)

for instance, by discovering when changes occur in the data distribution (NGUYEN; YANG; BRANKE, 2012;MAVROVOUNIOTIS; LI; YANG, 2017;KAMOSI; HASHEMI; MEYBODI, 2010). Most of the works found in the literature involving optimization algorithms are prob-lems involving the maximization or minimization of a function (KARABOGA; BASTURK, 2007; ZHU; KWONG, 2010; MIRJALILI; LEWIS, 2016;BERTSEKAS; SCIENTIFIC, 2015; RAO, 2016; DOĞAN; ÖLMEZ, 2015; MIRJALILI, 2015; ASKARZADEH, 2016; LIU et al., 2016; YAZ-DANI; JOLAI, 2016). Usually these functions involve problems generated from a determin-istic sequence of possible solutions, often requiring the use of at least the first derivative of the objective function with respect to design variables (YANG, 2010). In these methods, the objective function and constraints are given as mathematical functions and functional relations. In addition, the objective function must be continuous and differentiable in the search space. There are theorems for deterministic problems that guarantee their con-vergence to an optimal solution that is not necessarily the global optimum (SERGEYEV; KVASOV, 2015). In this type of problem the solution found is extremely dependent on the starting point provided and may converge to a local optimum. Thus, they do not perform well in optimizing multimodal functions (KVASOV; SERGEYEV, 2015; LERA; SERGEYEV, 2015).

However, there are approaches that use optimization algorithms modeled for other pur-poses, such as finding patterns for financial market operations (CANELAS; NEVES; HORTA, 2012;CANELAS; NEVES; HORTA, 2013;SOUZA; BRASILEIRO; OLIVEIRA, 2015;BAÚTO et al., 2018), classification (THORNTON et al., 2013), image processing (MINAEI-BIDGOLI; PUNCH, 2003), data mining (YANG, 2013), production system maintenance (MAVROVOUNIOTIS; LI; YANG, 2017; GEORGOULAS et al., 2015; KARVELIS et al., 2015). In objective function optimization, there is no difference between training and testing, the whole process is done iteratively and incrementally in the function itself to be optimized. However, when used for machine learning or for finding patterns, it is necessary to segment the data into training and test sets. As a result, a number of problems arise, such as overfitting; this occurs when the model fits too closely the training dataset. Generally, when this situation occurs, the trained model gets high accuracy in the training set, but does not have good generalization for other data. This can result in poor predictive performance in the test sets. So it is important to avoid overfitting (TUITE et al., 2011).

According to Becker and Seshadri (2003), some techniques, such as, (i) using the validation set or (ii) using early stop criteria, can be used to avoid the overfitting. In both approaches, firstly, the time series is divided into three parts: training set, validation set and test set. In (i), the fitness function evaluation of the solutions found in the training space is performed in the validation space. In (ii), there are two possibilities. In the first one, the solutions are obtained from the training set and their fitness are evaluated at predetermined time intervals in the validation set. The process is terminated when a degradation in the generalization capability is detected in the validation space. The second

(27)

possibility is to finalize the training process after 𝑛 iterations without improvement in the best fitness found during the training. In this case, the validation set is not required. 1.1 MOTIVATION

Over the past few decades, CI has been dedicated to designing machine learning al-gorithms that can learn and model specific problems to support decision making. One approach in this area is supervised learning, which attempts to learn about a knowledge domain by looking at previous cases or instances of the problem and their solutions. This approach aims to identify and model the relationship between the descriptor attributes and the outputs of past instances. The model that represents the relationship between in-puts and outin-puts represents the knowledge domain and can be used to solve instances not seen in the same problem. Examples of these problems are present in people’s daily lives: spam filtering, credit card fraud detection, credit risk assessment, stock price forecasting, among others (GUZELLA; CAMINHAS, 2009;ATSALAKIS; VALAVANIS, 2009).

Classical supervised learning assumes that the statistical distribution of instances at some knowledge domain is immutable, i.e., the examples belong to a fixed and unchanging probability distribution (GAMA et al., 2004). If this assumption were valid for the whole machine learning prediction process, when an algorithm learns how to perform a task, the predictive model could be used to perform that task every time in the future. In this way, after the learning process, the system would not need further improvements or changes. However, in real-world applications, data arrives in a stream and as a result patterns evolve over time, as concepts are often unstable (WEBB et al., 2016; ŽLIOBAIT ˙E; PECHENIZKIY; GAMA, 2016). This is because the dynamism is inherent in data streams, since data is collected over a long period of time. In practice, this instability implies that the instance set has one legitimate output at a given time and another legitimate output at a different time (SETHI; KANTARDZIC, 2017).

The financial market is one research area of CI that has been receiving a lot of atten-tion from researchers. Among the many existing approaches, one that has been receiving attention is related to machine learning systems capable of supporting market decisions. So, the main motivation of this work is to propose a decision support system that tries to maximize profits in operations carried out in the stock market and that is capable of dealing with involved uncertainties and risks. The goal is to automatically decide on the right times for stock trading, supporting investors decisions.

According to the Aite Group’s report1_{, in 2006 about 33% of US stock market}

opera-tions were already controlled by automated systems and in 2010 that figure was expected to rise to 50%. Similarly, in 2007 it was estimated that around 40% of the London stock exchange transactions were also carried out automatically. According to the Albert J. 1 _{http://www.aitegroup.com/reports/200610311.php}

(28)

Menkveld website2, nowadays 70% of transactions on US stock exchanges and 39% in

Europe are made by automatic systems. It is expected that, in the future the markets will be dominated by automatic systems, playing a key role in the stock market.

In 2015, the PAA-PSO (SOUZA; BRASILEIRO; OLIVEIRA, 2015) was proposed as a model that uses data mining techniques in financial time series for stock trading. This approach uses the combination of both the PAA and the Particle Swarm Optimization (PSO) techniques and tries to find the best representative pattern in the time series to optimize financial results in terms of profit, considering the decision rules proposed by the authors. The results showed that the PAA-PSO (SOUZA; BRASILEIRO; OLIVEIRA, 2015) obtained equal or better results when compared to SAX-GA (CANELAS; NEVES; HORTA, 2012), which combines SAX with Genetic Algorithm (GA) techniques. The PAA-PSO also outperformed the Buy-and-Hold strategy, in which the investor buys the stock and holds it in his/her investment portfolio for a long period of time. The Buy-and-Hold strategy is widely used as a reference for models applied in the financial market (BRASILEIRO et al., 2013; TEIXEIRA; OLIVEIRA, 2009; TEIXEIRA; OLIVEIRA, 2010; CANELAS; NEVES; HORTA, 2012; CANELAS; NEVES; HORTA, 2013; CAVALCANTE et al., 2016; SOUZA; BRASILEIRO; OLIVEIRA, 2015).

However, both PAA-PSO (SOUZA; BRASILEIRO; OLIVEIRA, 2015) and SAX-GA (CANELAS; NEVES; HORTA, 2012) approaches produced a high variance in their results. In practice, this generates poor results for a decision support system, since such variability in the transactions represents a high risk for investors.

In addition, to deal with problems involving time series it is necessary to detect changes in the concept, as discussed above. On the other hand, Nguyen, Yang and Branke (2012) and (MAVROVOUNIOTIS; LI; YANG, 2017) state that most works in the optimization liter-ature focus on static problems, yet real world problems are more complex as they need to take into account changes that may occur over time.

1.2 DEFINITION OF THE PROBLEM

Based on this context, the main research question investigated in this thesis is:

Research question Is it possible to build a system for stock trading that significantly

improves results considering the fact that the optimization environment is dynamic and, so, includes elements such as (1) multi-swarms, (2) hybrid methods based on swarm op-timization, (3) reduction of variance in results and (4) methods to detect changes in the environment?

(29)

In order to answer this question, it is necessary to understand the approaches that exist in the literature, to choose the most appropriate models that can be used both for the development of a hybrid architecture and for the evaluation of the obtained results. Thus, the objectives of this study are defined in the following section.

1.3 OBJECTIVES

The main objective of this study is to develop a decision support system able to decide which operations to carry out in the stock market. To this end, data mining techniques responsible for discovering patterns in the financial time series will be used along with the proposed decision rules. Also, optimizing the discovered patterns and detecting changes in the environment is performed through optimization algorithms for dynamic environments. Motivated in overcoming the main problems found in the literature, as discussed in Section 1, a multi-swarm method for dynamic environments with validation and early stopping criteria is proposed. The proposed method should be:

1. Able to reduce the dimensionality of financial time series, because of the large amount of data;

2. Efficient in discovering time series patterns along with decision rules, deciding the best times to buy and sell stocks;

3. Prepared to detect and respond to changes in the dynamic environment; 4. A decision support system to help investors in their operations;

5. Have low variance in its results and thus have controlled risk in the investment; 6. Optimized to maximize profit;

7. Capable of making profits above other methods, such as those presented in Section 1.

1.4 CONTRIBUTIONS

The proposed models for solving the problems discussed in this chapter, namely PAA-MS-IDPSO-V and PAA-MS-HmSO-V, are the main contributions of this study. These models led to the following contributions (Chapters 3 and 4, respectively):

• BRASILEIRO, R. C.; SOUZA, V. L.; OLIVEIRA, A. L. Automatic trading method based on piecewise aggregate approximation and multi-swarm of improved self-adaptive particle swarm optimization with validation. Decision Support

(30)

• BRASILEIRO, R. C.; SOUZA, V. L.; OLIVEIRA, A. L. Automatic method for trading stocks in the financial market based on data mining techniques and multi swarm optimization algorithm for dynamic environments. Submitted to

Engi-neering Applications of Artificial Intelligence, Elsevier, 2019.

Two additional contributions were developed during this study: (1) the PAA-PSO model, which was compared to the SAX-GA and originated the initial development of this thesis (SOUZA; BRASILEIRO; OLIVEIRA, 2015). And (2), a literature review on CI applied to the financial market was made (CAVALCANTE et al., 2016). This last research identified the absence of optimization algorithms applied to real world problems capable of detecting changes in financial markets, which served as the basis for the main models proposed in this thesis.

1.5 ORGANIZATION

Fig. 2 depicts the overview of this thesis, where the boxes represent the chapters and the submitted papers while the arrows indicate the sequence flow of the thesis. Chapters within the dotted region represent the proposed models that have been published or were submitted during the development of this thesis. Chapters 3 and 4 inside the blue box are, respectively, the contributions related to the models MS-HmSO-V and PAA-MS-IDPSO-V. In the green region, secondary works that were published and inspired this thesis are highlighted.

Fig. 2 – General overview of the thesis, where boxes represent the chapters and other contributions of the thesis. The box in blue presents the contributions on the topic of optimization based on multi-swarm and optimization for dynamic envi-ronments. The box in green shows the contributions as co-author on the topics of optimization.

The thesis begins with Chapter 1 (present chapter) containing its introduction, moti-vations and the objectives of this study.

(31)

Chapter 2 reviews the literature on the fundamental concepts of this thesis. First, some decision support systems for stock trading are presented. Next, the data mining in time series is detailed, explaining the dimensionality reduction approaches. Also, optimization functions, focusing on multimodal functions. Afterwards, optimization algorithms are ex-plained, specifically PSO, Improved Self-adaptive Particle Swarm Optimization algorithm (IDPSO), Hibernating Multi-Swarm Optimization (HmSO), multi-swarm approaches and dynamic optimization approaches. This Chapter also includes a discussion of validation methods and the early stopping criterion for optimization algorithms.

Chapter 3 introduces the PAA-MS-IDPSO-V model. In this chapter, initially, the PSO particle structure is presented along with the stock trading model. Then the PAA-MS-IDPSO-V model is presented and its effectiveness is demonstrated through a set of experiments performed on the stocks of the S&P100 index. At the end of the Chapter we present a summary of the experiments and answer part of the research questions that were raised in this Chapter, in Section 1.2.

In Chapter 4 the model PAA-MS-HmSO-V is presented. The PSO particle struc-ture and the trading model are exactly the same as presented in Chapter 3, so the pro-posed model is presented. Then the experiments (again using most of the stocks from S&P100 index) are presented, demonstrating that MS-HmSO-V outperforms PAA-MS-IDPSO-V with statistical confidence. In addition to comparing the results with the models in Chapter 3, we also perform comparisons using the Random Forest classifiers and the Support Vector Machine (SVM), along with a series of technical analysis features. Finally, the summary of the experiments is presented to answer parts of the research ques-tions presented in the Section 1.2.

Finally, in Chapter 5 the general conclusions and future works of this study are pre-sented.

(32)

2 LITERATURE REVIEW

In this chapter, the following research areas are contextualized: (i) decision support systems, (ii) dimensionality reduction techniques used in time series data mining, (iii) op-timization functions, focusing on multimodal functions, (iv) the opop-timization algorithms, as the technique responsible for discovering the patterns in the time series and the concept of multi-swarms, (v) optimization techniques for dynamic environments, and (vi) the use of the validation set and the early stopping criterion.

In (ii), emphasis is given to the PAA and SAX techniques. While the first technique is one used in this work, also in the PAA-PSO (SOUZA; BRASILEIRO; OLIVEIRA, 2015), the second one is used in SAX-GA approach (CANELAS; NEVES; HORTA, 2012).

In (iii), the optimization functions and the problems in which they are involved are also mentioned. Here, the concepts of unimodal and multimodal functions are presented, with illustrations to exemplify and facilitate the understanding of the types of problems that are being addressed.

In (iv), the optimization algorithms (static approach) are presented followed by the multi-swarm algorithms. These concepts were used in model PAA-MS-IDPSO-V, which is detailed in Chapter 3.

Em (v), algoritmos de otimização baseados em ambientes dinâmicos são conceituados (destacando o algoritmo HmSO), as técnicas para detecção de mudanças no ambiente e as ações a serem executadas após a detecção ocorrer. Aqui, será apresentado um novo mecan-ismo híbrido para detecção de mudanças no ambiente. O algoritmo HmSO foi utilizado neste trabalho, com modificações essenciais, utilizando uma arquitetura multi-swarm, que está detalhada no capítulo 4.

Finally, all the problems involving overfitting in optimization algorithms are addressed. We present solutions to this problem, emphasizing the validation set and the early stop-ping criterion.

2.1 DECISION SUPPORT SYSTEMS FOR STOCK TRADING AND DATA STREAMS

The Decision Support Systems (DSSs) are systems that analyze and compile a large amount of data and use these information to assist users in their decision making ( CABRERA-PANIAGUA et al., 2015). These systems are used in a variety of areas, including financial and stock market (CABRERA-PANIAGUA et al., 2015; ZHANG et al., 2011; KAO et al., 2013; GEVA; ZAHAVI, 2014;OLIVEIRA; CORTEZ; AREAL, 2016; SHYNKEVICH et al., 2016).

Zhang et al. (2011) proposed a DSSs based on an ensemble of decision tree classifiers, logistic regression and Support Vector Machine (SVM), to find noises in the time series without the need for any pre-processing method. Initially, each classifier in the ensemble

(33)

is trained using different parts of the training set. This diversity gives the ensemble the ability to identify noises and deal with them. The experimental results show that the proposed method is superior to other methods found in the literature, which try to detect noise in the time series (ZHANG et al., 2011).

Kao et al. (2013) proposed a DSSs to predict stock prices based on a hybrid approach, using feature extraction with Multivariate Adaptive Regression Splines (MARS) and Re-gression Support Vector Machines (SVR). This approach is composed of three stages: initially the time series is transformed into sub-series; then the approach uses MARS to detect significant sub-series. Finally, these sub-series containing the key features are used to evaluate the predicted price accuracy, which will be applied as new input variables in the SVR to construct the DSS model. The study shows that the proposed approach outperforms other competing models.

Geva and Zahavi (2014) proposed a system, which incorporates financial market and textual news information and recommends when the purchase and sale of stocks should be performed. According to the authors, the integration of these two types of data source have the potential of detecting unknown patterns when using the two data sources separately. In addition, it was verified that the results are even better when using nonlinear neural networks as classifiers, which according to the authors are well suited for time series predicting values, such as stock prices.

Oliveira, Cortez and Areal (2016) proposed an automatic method to perform the sentiment analysis in the stock market through messages obtained from microblogs (in this case StockTwint, a microblog dedicated to the financial market, was used). The approach is responsible for creating lexical acquisitions in the financial market, which are based on statistical metrics created from a set of StockTwint messages. According to the author, the results showed that the proposed approach is capable of identifying investors’ sentiments, contributing to the decision making.

According to Shynkevich et al. (2016), the market changes as new information is released, for example, information derived from news articles. Such information is capable of affecting investors’ decision making. In this context, in (SHYNKEVICH et al., 2016) the authors proposed a DSS capable of simultaneously reading these news articles and being able to provide different degrees of relevance to the acquired information. The more information is obtained the better the method performance, being able to overcome other approaches.

Pinto, Neves and Horta (2015) proposed a Multi-Objective Evolutionary System, using Multi-Objective GA to optimize a set of Trading or Investment Strategies (TSs), aiming predict the future trends in stock market. In this approach uses Volatility Index (VIX) and other Technical Indicators (TI) are optimized to find the best investment strategy to set potential buy, sell or hold positions in the stock markets to obtain high returns at a minimal risk. By using TI the system can obtain the trend of the stock price and the

(34)

VIX indicator can be used to avoid the main stock drops. Also, this approach has great potential generalization capabilities; by the Pareto-Fronts the system can obtains a set of TSs suitable for the profiles of both risky and careful investors.

Majhi and Anish (2015) proposed two multiobjective Legendre Polynomial Based Adaptive Nonlinear (LPBAN) forecasting model with derivative free training scheme and fuzzy decision making rule. The authors compare the use of the Multi-Objective Particle Swarm Optimization (MOPSO) and Non dominated sorting genetic algorithm (NSGA-II) multiobjective optimization algorithms by simultaneously optimizing four performance measures: Mean Absolute Percentage Error (MAPE), Directional accuracy (DA), Theil’s U-statistics and Average RVvariance (ARV). Simulation study with real life data has demonstrated that both the models are competent to each other in some ways. Fuzzy logic based selection strategy has been employed to choose the best possible solutions from the Pareto front.

Cervello-Royo, Guijarro and Michniuk (2015) proposed a risk-adjusted profitable trad-ing rule based on technical analysis and the use of a new definition of the flag pattern to define buy or sell moments over time. The main contributions of this method are: 1) a new definition of the flag pattern, strengthen the statistical robustness of the pattern and its use in the design of the trading rule, 2) trading rules, using closing and opening prices, runs stop loss and take profit operators, 3) maximum drawdown of the return curves is used as risk measure, 4) the results obtained empirical evidence that the model was possi-ble to develop an investment strategy capapossi-ble of beating the market in the mean–variance sense, which confronts the Efficient Market Hypothesis.

Hu et al. (2015) proposed a Hybrid Evolutionary Trend Following Algorithm (eTrend) combining Trend Following (TF) investment strategies with the eXtended Classifier Sys-tems (XCS) as evolutionary learning, capable of operating both long-term and short-term. Through the XCS, many valuable trading rules are discovered at the running time, giving this approach ability to automatically adapt to market directions and uncover reason-able and understandreason-able trading rules, avoiding irrational trading behaviors of common investors. The analysis of the obtained results revealed that eTrend outperforms the buy-and-hold strategy, decision tree and artificial neural network trading models.

Zbikowski (2015) proposed a trading strategy combination Volume-Weighted SVM with F-Score feature selection and several technical indicators to make predictions about future trends of stocks. This approach makes an assumption that incorporating volume-based weighting into penalty function can lead to significant improvement in classifier accuracy. The experimental results showed that using the proposed approach combined with feature selection significantly improves sample trading strategy results in terms of both overall rate of return and maximum drawdown.

Regarding the portfolio optimization problem, investors seek to select the best diver-sification in the investments present in their portfolio, according to some criteria, such

(35)

as lower risk. The composition of investments in a portfolio depends on a number of fac-tors, among which the most important are risk tolerance, investment horizon and value invested (HUANG, 2012). In this context, Huang (2012) has proposed a hybrid model that uses Support Vector Regression (SVR) combined with GA to select portfolio stocks. In this model, the SVR method is used to provide reliable stock rankings. The best rated stocks can thus be selected to form a portfolio. On top of this model, GA is used for both optimizing the model parameters and selecting input variables for the SVR model.

Still regarding the portfolio optimization problem, Chen (2015) proposed a model using Fuzzy Logic and Artificial Bee Colony (ABC). Gorgulho, Neves and Horta (2011) propose a new approach, based on Intelligent Computing, in particular using GA, which aims to manage a financial portfolio through technical analysis indicators, including: Exponential Moving Average (EMA), Rate of Change (ROC), Relative Strength Index (RSI), Moving Average Convergence Divergence (MACD) and On Balance Volume (OBV). Zhu et al. (2011) presented a portfolio selection model using a multiobjective variation of PSO. The use of multiobjective is due to the optimization of both maximization of profit and minimization of risk.

2.2 TIME SERIES DATA MINING

Data mining can be defined as “making the best possible use of data”, based on the theory that historical data stores information capable of estimating its future behavior. In the context of the financial market, data mining can be used to find patterns in finan-cial time series, representing their behavior through their trends, seasonality, and other information (ARGIDDI; APTE, 2014). Thus, the patterns found in historical financial time series can be used to define the best moments for buying or selling stock, helping investors in their decision making (KANNAN et al., 2010).

As presented, data mining has as its main objective to uncover hidden information from the data, either in its original or transformed format. However, time series data have as main characteristic the high dimensionality, and working with them becomes a computationally expensive task. Methods that deal with this problem have been pro-posed, among them are dimensionality reduction techniques that are capable of creating a representation of the original series in other domains with less dimensions. In this way, operations using similarity measures can be performed with the transformed data, as if they were the original, at a lower computational cost (FU, 2011). These similarities can be used to perform other tasks, such as pattern discovery, classification, clustering, among others (FU, 2011).

Thus, in this section dimensionality reduction techniques applied to financial time series are presented. A greater emphasis is placed on the description of the Piecewise Aggregate Approximation (PAA) technique in the context of pattern discovery.

(36)

2.2.1 State of art: Dimensionality reduction

The most commonly used techniques for dimensionality reduction are: Fourier trans-form (FT), Dynamic time warping (DTW), Singular Value Decomposition (SVD), PAA and SAX (SUN et al., 2014). These techniques present a problem by losing useful infor-mation for time series analysis due to the reduction of dimensionality. However, their use with other approaches may reduce these losses or their effects (SUN et al., 2014).

DTW is a well-established technique for finding optimal alignment between two time series under certain constraints. The sequences are generally transformed into nonlinear shapes to coincide with one another. This approach has been successfully used, among other areas, to compare different sequences or patterns and for information retrieval (THONGMEE et al., 2014)

In Thongmee et al. (2014), the authors use the Blockwise Strong Relationship (BSR) method, which calculates the relationship between any pair of stocks in the financial market, considering only only their prices. To accomplish this task, the authors used data transformation from SAX and distance measurement using DTW.

Tsinaslanidis and Kugiumtzis (2014) developed a stock price prediction algorithm based on the combination of two data mining tools. The first is Perceptually Important Points (PIP), which is used to dynamically segment financial time series into smaller win-dows and the second is DTW, which was used to find similar winwin-dows. Finally, predictions are made from mappings between the most similar windows.

citeonlinebagheri2014 presented a hybrid method that combines the DTW technique with the Wavelet Transform (WT) method for automatic pattern extraction in financial time series, specifically for Foreign Exchange (FOREX). The results indicate that the pro-posed hybrid method is very useful and effective for prediction and extraction of financial patterns.

According to Fourier, any function can be represented as the sum of a series of sine and cosine functions. In time series, FT is used for the spectral analysis of the series, i.e., the study of the behavior of the series in terms of an average frequency composition. In general terms, spectral analysis can be used to detect periodic signals (METCALFE; COWPERTWAIT, 2009).

Chen and Chen (2014) integrate the entropy discretization technique with the Fast Fourier Transform (FFT) algorithm to create a fuzzy time series prediction model. The proposed model is implemented using the bootstrapping method, which incrementally up-dates its predictive capability. The results indicate that the model’s incremental learning mechanism enables it to effectively deal with large financial datasets.

Huang, Zhu and Ruan (2014) propose a model based on the FFT approach to valuing stock options when assets follow the double exponential jump process with volatility and stochastic itemsity. This model captures three structural terms: stock prices, implied market volatility, and jump behavior. The results showed that the FFT approach is fast

(37)

and has good results.

SVD, proposed by Banerjee and Pal (2014), is a widely used technique for decom-posing a matrix into several component arrays, with important properties of the global matrix. Thus, SVD is a data reduction technique that can be understood as a method for transforming correlated variables into a set of uncorrelated variables that have various relationships between the original data items. Also, SVD is a technique for identifying and sorting dimensions along data points that have the most variation.

When applied in time series, SVD was used by Caraiani (2014) which proposes a correlation-based approach to analyzing US stock market financial data. The model uses the SVD-based entropy calculation of the correlation matrix for the Dow Jones index components.

Canelas, Neves and Horta (2012) and Canelas, Neves and Horta (2013) proposed two models for automating financial market operations using the SAX technique. In both models, SAX is combined with a GA-based optimization to identify the most relevant patterns of financial time series that are used with an investment rule for automating stock trading.

2.2.2 Piecewise Aggregate Approximation (PAA)

Piecewise Aggregate Approximation (PAA), proposed by Keogh et al. (2001), is an approach used for data representation and dimensionality reduction in time series data mining. In this method, a time series window of size n is divided into k segments of equal length, and the average value of the data of the segments is then used as the representative value of each segment. Hence, a time series PAA representation will be a

k-dimensional vector of the means values of each segment. Fig. 3 depicts an example of a

PAA representation.

PAA is performed in two steps (KEOGH et al., 2001). Initially, the original time se-ries window data must be standardized. The purpose of this step is to convert the data to the same relative amplitude, keeping the original form of the data. The statistical standardization is computed via Eq. (2.1).

𝑥′_𝑖 = 𝑥𝑖− 𝜇𝑥 𝜎𝑥

(2.1) where 𝑥𝑖 represents a point in the time series window, 𝜇𝑥 and 𝜎𝑥 are, respectively, the

average and standard deviation considering all points in the window, and 𝑥′

𝑖 is a point in

the standardized series window. Fig. 4 depicts an example of the data standardization. The next step is the dimensionality reduction carried out by PAA. Let the time series size be m, a window of this series have size n (n << m), and k be the number of segments to which the time window will be reduced. Let 𝑥𝑖 be the representative value of the ith

segment and ⃗𝑥′

𝑗 be the vector of elements of the segment. Then, if the relationship 𝑛/𝑘 is

(38)

Fig. 3 – PAA representation of a time series Q. In this example, PAA parameters are n = 15, k = 5.

Fig. 4 – Statistical standardization of the time series data. 𝑥𝑖 = 𝑘 𝑛 𝑛 𝑘𝑖 ∑︁ 𝑗=𝑛_𝑘(𝑖−1)+1 𝑥′_𝑗 (2.2)

However, if the relationship 𝑛/𝑘 does not result in an integer, the border points be-tween the segments should contribute to the formation of the final value for both segments. In this case, PAA carries out the operation as shown in Fig. 5.

As can be seen in Fig. 5, in which there are 12 points and five segments, this means that each segment must have 2.4 points of the contribution to the average. Thus, points 1 and two entirely belong to segment S1 and to complete it its value should have 40% of point 3. Segment S2 is formed by the other 60% of point 3, the whole point 4 and 80% of point 5, to complete the 2.4 points for computing its average. PAA fills the rest of the segments as explained and depicted in Fig. 5.

(39)

Fig. 5 – PAA segments when the relationship 𝑛/𝑘 does not result in integer. In this ex-ample, PAA parameters are 𝑛 = 12, 𝑘 = 5, which means that each segment must have 2.4 points of the contribution to the average.

PAA has some advantages: facility to implement; very speedy execution; generates a flexible model; indexing values may be done in linear time; the proximity between two time series can be easily computed by the Minimum Distance (MINDIST) between their respective PAA representations (KEOGH et al., 2001;CHAKRABARTI et al., 2002).

Given 𝑄 and 𝐶 PAA representations of two time series, MINDIST is computed by Eq. (2.3). Fig. 6 illustrates the computation of MINDIST between 𝑄 and 𝐶 (PAA represen-tations of two time series).

𝑀 𝐼𝑁 𝐷𝐼𝑆𝑇 (𝑄, 𝐶) = √︂_𝑛 𝑘 ⎯ ⎸ ⎸ ⎷ 𝑘 ∑︁ 𝑖=1 (𝑑𝑖𝑠𝑡(𝑞_𝑖, 𝑐𝑖))2 (2.3)

Fig. 6 – Computing MINDIST between two PAA time series representations, 𝑄 and 𝐶. A relevant feature of MINDIST is that it represents the lower limit of the Euclidean distance. This feature allows the method (i) to carry out data mining using the represen-tation efficiently and (ii) to produce identical results to those obtained if one operates on the original data.