• Nenhum resultado encontrado

Estudo da influência das forçantes climáticas na previsão sazonal de precipitação para as regiões norte e nordeste do Brasil

N/A
N/A
Protected

Academic year: 2021

Share "Estudo da influência das forçantes climáticas na previsão sazonal de precipitação para as regiões norte e nordeste do Brasil"

Copied!
111
0
0

Texto

(1)UNIVERSIDADE FEDERAL DO RIO GRANDE DO NORTE CENTRO DE CIÊNCIAS EXATAS E DA TERRA Programa de Pós-Graduação em Ciências Climáticas. ESTUDO DA INFLUÊNCIA DAS FORÇANTES CLIMÁTICAS NA PREVISÃO SAZONAL DE PRECIPITAÇÃO PARA AS REGIÕES NORTE E NORDESTE DO BRASIL. RONABSON CARDOSO FERNANDES. NATAL RN NOVEMBRO/2017.

(2) ESTUDO DA INFLUÊNCIA DAS FORÇANTES CLIMÁTICAS NA PREVISÃO SAZONAL DE PRECIPITAÇÃO PARA AS REGIÕES NORTE E NORDESTE DO BRASIL. RONABSON CARDOSO FERNANDES. Tese de Doutorado apresentada ao Programa de PósGraduação em Ciências Climáticas, do Centro de Ciências Exatas e da Terra da Universidade Federal do Rio Grande do Norte, como parte dos requisitos para obtenção do título de Doutor em Ciências Climáticas.. Orientador: Prof. Dr.Paulo Sérgio Lucio Co-orientador: Prof. Dr.José Henrique Fernandez. COMISSÃO EXAMINADORA. Prof. Dr. David Mendes (UFRN) Examinador Interno à Instituição Prof. Dr. Gilvan Luiz Borba (UFRN) Examinador Interno à Instituição Prof. Dr.Josemir Araújo Neves (EMPARN) Examinador Externo à Instituição Prof. Dr. Éverton Frigo (UNIPAMPA) Examinador Externo à Instituição. NATAL RN NOVEMBRO/2017.

(3) Catalogação da Publicação na Fonte. UFRN / SISBI / Biblioteca Setorial Centro de Ciências Exatas e da Terra-CCET Fernandes, Ronabson Cardoso. ESTUDO DA INFLUÊNCIA DAS FORÇANTES CLIMÁTICAS NA PREVISÃO SAZONAL DE PRECIPITAÇÃO PARA AS REGIÕES NORTE E NORDESTE DO BRASIL / Ronabson Cardoso Fernandes. - Natal, 2017. 111f.: il. Universidade Federal do Rio Grande do Norte, Centro de Ciências Exatas e da Terra, Programa de Pós-Graduação em Ciências Climáticas. Orientador: Prof. Dr.Paulo Sérgio Lucio. Coorientador: Prof. Dr.José Henrique Fernandez. 1. modelagem. 2. coerência espectral. 3. climatologia. 4. polinômios. I. Lucio, Prof. Dr.Paulo Sérgio. II. Fernandez, Prof. Dr.José Henrique. III. Título. RN/UF/BSE-CCET.

(4) UNIVERSIDADE FEDERAL DO RIO GRANDE DO NORTE PROGRAMA DE PÓS-GRADUAÇÃO EM CIÊNCIAS CLIMÁTICAS. RONABSON CARDOSO FERNANDES. ESTUDO DA INFLUÊNCIA DAS FORÇANTES CLIMÁTICAS NA PREVISÃO SAZONAL DE PRECIPITAÇÃO PARA AS REGIÕES NORTE E NORDESTE DO BRASIL. Esta Tese foi julgada adequada para obtenção do Título de DOUTOR EM CIÊNCIAS CLIMÁTICAS, sendo aprovada em sua forma final.. Natal, 23 de Novembro de 2017..

(5) DEDICATÓRIA. Dedico especialmente à minha família e a minha mãe Maria das Neves Faustino Cardoso Fernandes..

(6) AGRADECIMENTOS. Agradeço a Deus pela proteção e ajuda incondicional nos momentos difíceis.. A minha família, inicialmente aos meus pais José Carlos, Maria das Neves, aos meus irmãos Robervância, Robson e Rawlinson e, aos meus queridos sobrinhos.. Aos meus orientadores Paulo Sério Lucio e ao José Henrique Fernandez pela contribuição nesse desafio. À CAPES e a FAPERN pela concessão da bolsa de estudo. Ao Programa de Pós-Graduação em Ciências Climáticas – PPGCC, da Universidade Federal do Rio Grande do Norte, pela disponibilidade de infraestruturas de material e de recursos humanos.. Aos meus amigos André L. de Carvalho, Henderson Wanderley, George Ulguim, Naurinete Barreto, Washington Filho, Patrícia Viana, Lais Schmalfuss, Débora Busmann, Laboratório do Geopro e a todos que me enriqueceram de alegrias e de conhecimento.. Aos meus amigos da Pós-Graduação em Ciências Climática.. Aos meus Professores.. Aos funcionários que nos ajudam nos dia a dia..

(7) RESUMO. Diante da perspectiva do aumento de extremos de precipitação pluvial devido às mudanças climáticas, estudar a influência da atividade ciclo solar e fluxo de raios cósmicos nessa variável meteorológica é de grande importância. A precipitação pluvial é um fator preponderante para a agricultura, setor energético, pecuária e para a economia. Com isso, é de extrema importância estudar a relação entre elas. Inicialmente, houve a necessidade de reconstruir diversas séries históricas com percentual inferior a 60% de dados faltantes nas séries históricas de GCR, em que, o método MTSDI foi apontado como o melhor modelo, entre aquelas estudadas, para a realização de imputação de dados observacionais de GCR. Sendo assim, escolhida a estação de Huancayo/PER para correlacionar com a precipitação pluvial na região Norte e Nordeste do Brasil. Verificou-se que houve coerência estatística pela técnica de Wavelet Coherence entre a precipitação pluvial com GCR e SSN na escala mensal, sazonal, anual, interanual e interdecadal. Pela técnica de Maximal Overlay Transform (MODWT) constatou-se que existe correlação significativa entre as séries estudadas nas escalas de 5,3 anos, 10,6 anos, 22,3 anos e 44,6 anos. E, por fim, pode-se construir um modelo para predizer a série histórica da precipitação pluvial, mostrando-se satisfatório. Portanto, essa pesquisa mostrou que existe influência do fluxo de raios cósmicos e da atividade solar na precipitação pluvial região tropical Brasileira.. Palavras–chave: modelagem, coerência espectral, climatologia, polinômios..

(8) ABSTRACT. Considering the perspective of increasing extremes of rainfall due to climate change, studying the influence of solar cycle activity and cosmic ray flux on this meteorological variable is of great importance. Rainfall is a preponderant factor for agriculture, the energy sector, livestock and for the economy. With this, it is extremely important to study the relationship between them. Initially, it was necessary to reconstruct several historical series with a percentage of less than 60% of missing data in the historical GCR series, in which the MTSDI method was pointed out as the best model, among those studied, to perform the imputation of observational data Of GCR. Thus, the Huancayo / PER station was chosen to correlate with rainfall in the North and Northeast of Brazil. It was verified that there was statistical coherence by the Wavelet Coherence technique between rainfall with GCR and SSN in the monthly, seasonal, annual, interannual and interdecadal scale. Maximal Overlay Transform (MODWT) showed that there is a significant correlation between the series studied in the scales of 5.3 years, 10.6 years, 22.3 years and 44.6 years. And, finally, a model can be constructed to predict the historical series of rainfall, proving to be satisfactory. Therefore, this research showed that there is influence of cosmic rays flux and solar activity on Brazilian tropical precipitation.. Keywords: modeling, spectral coherence, climatology, polynomials..

(9) ix SUMÁRIO PAGS. LISTA DE FIGURAS .................................................................................. LISTA DE TABELAS ................................................................................ LISTA DE SIGLAS...... ................................................................................ INTRODUÇÃO ........................................................................................... CAPÍTULO I - Data Imputation Analysis for Cosmic Rays Time Series ........................................................................................................................ CAPÍTULO II - Periodic determination of the galactic cosmic rays with application of The maximal Overlap Discrete Wavelet Transform (MODWT) .................................................................................................... x xiii xiv 16 19. 36. CAPÍTULO III- Study of the Galactic Cosmic Rays and Sunspots influence over the brazilan of Northern and Northeastern precipitation....... 46. CAPÍTULO IV- How does Galactic Cosmic Rays and Sunspots influence over in South American precipitation? ......................................................... 62. CAPÍTULO V - A Sun-Basead Model For Tropical Rainfall Estimation CONCLUSÃO ....................................................................................... REFERÊNCIAS............................................................................................ ANEXO ........................................................................................................ 75 96 98 107.

(10) x LISTA DE FIGURAS. FIGURAS CAPÍTULO I Fig. 1. (a): GCR NM stations spatial distribution in the globe, over the Mollweide projection and (b) Climax (CLMX) and Rome (ROME) GCR NM locations, according to the Azequalarea projection……………………………. Fig. 2. GCR intensity (a) Climax (in blue) and Rome (in magenta) GCR monthly observed time series and (b) their correlation for the period from January 1st, 1960 to December 31st, 2004……………………………………… Fig.3 - Average coefficients (a) d, (b) R (c) R² and (d) NSE for the respective imputations percentage, calculated by AMÉLIA II (blue), MICE (green) and MTSDI (red) models. ................................................................................... Fig.4 - Average NRMSE coefficients for the respective imputations scenarios in the AMÉLIA II (blue), MICE (green) and MTSDI (red) models…………... Fig. 5- Comparison between the original ROME Station GCR Time series and 1-9 missing data scenarios reconstructed series, for the period from January 1960 to December 2004, as obtained using the (a) AMÉLIA II, (b) MICE and (c) MTSDI models. ................................................................................. Fig. 6- Comparison between the original CLMX Station GCR Time series and 1-9 missing data scenarios reconstructed series, for the period from January 1960 to December 2004, as obtained using the (a) AMÉLIA II, (b) MICE and (c) MTSDI models. ..................................................................................... 21. 24. 26. 26. 27. 28. Fig.7. F-test for ROME Station using (a) AMÉLIA II, (b) MICE and (c) MTSDI models, the percentage of missing data (scenario) and repetitions.. 29. Fig.8. T-test for ROME Station using (a) AMÉLIA II, (b) MICE and (c) MTSDI models, the percentage of missing data (scenario) and repetitions.. 30. Fig.9. GCR time series (a) originals and (b) with imputations…………………. 31. FIGURAS CAPÍTULO II. Figure 1. Spatial position of NM stations (Kiel, Rome, Climax and Huancayo).......................................................................................................... 38. Figure 2. Time series of NM stations (a) Kiel, (b) Rome, (c) Climax and (d) Huancayo, during the period January 1960 to December 2004........................... 40. Figure 3. Multiresolution decomposition MODWT for times series of (a) Kiel, (b) Rome, (c) Climax and (d) Huancayo during the period from January 1960. 41.

(11) xi to December 2004. ................................................................................ Figure 4. Percentage energy distribution of the MODWT, crystal multiresolution MRA for the times series of Kiel (a), Rome (b), Climax (c) and Huancayo (d), for the period from January 1st, 1960 to December 31st, 2004……………………………………………………………………………... Figure 5. Percentage energy distribution of the MODWT, multiresolution by crystal for the time series of (a) SSN and (b) solar radio flux 10.7 cm for the period from January 1st, 1960 to December 31st, 2004.. 42. 43. FIGURAS CAPÍTULO III Fig. 1. Location of 11 pluviometric stations and of GCR station located at Huancayo (Peru). ................................................................................................ 49. Fig.2. Time series of number of (a) sunspots (SSN) and (b) GCR (HuancayoPeru) for the period from January 1st, 1961 to December 31st, 2004…………. 51. Fig.3- Boxplot of the rainfall climatology, to each studied meteorological rain gauges, for the period from January 1st, 1961 to December 31st, 2004……….. 52. Fig. 4 – Wavelet Coherence between the time series of GCR (Huancayo/PER) and rain precipitation, according to respective locations. The black line represents the significance level of 5% and the influence cone the line with white color. ............................................................................................... Fig. 5 – Wavelet Coherence between the time series SSN and rainfall, according to respective locations. The black line represents the significance level of 5% and the influence cone the line with white color…………………... 55. 57. FIGURAS CAPÍTULO IV Figure 1- Pearson correlation between precipitation with SSN and GCR at periodicities of (a, e) 5.3 years, (b, f) 10.6 years (c, g) 21.3 years e (d, h) 46.6 years, with a significance level of 5%. The blank regions had low correlations or were not significant.................................................................................... 67. Figure 2- Time series of the SSN....................................................................... 68. Figure 3 - Precipitation in percentiles (a) 25th, (a) 50th, (a) 75th and (a) 95th comprising cumulative monthly data from January 1 st, 1961 to December 31st, 2004.............................................................................................................. 69. Figure 4 - Number of months with n <25%, 25% <n <50%, 50% <n <75% and> 75% in periods of maximum (a, c, e, g) and mínimum solar (b,d,f,h)..... 70.

(12) xii Figure 5- Number of months above the 95th percentile in the periods of maximum (a) and minimum solar (b).............................................................. 70. FIGURAS CAPÍTULO V. Figura 1- Mapa da área de estudo............................................................... Figura 2 – Regiões dos índices oceânicos......................................................... Figura 3 - Regressão polinomial entre a precipitação pluvial com as variáveis explicativas variado do grau (a) 1 ao (d) 4 ordem para a estação de Soure/AM, respectivamente........................................................................................ Figura 4 - Regressão polinomial multivariada entre a precipitação pluvial com as variáveis explicativas, variado do grau (a) 1 ao (d) 4 ordem para a estação de Recife /PE, respectivamente........................................................................ Figura 5 - Série temporal observada e estimada segundo a regressão polinomial de 3º ordem...................................................................................... 78 79. 83. 87. 91.

(13) xiii LISTA DE TABELAS. TABELA CAPÍTULO III Table 1 - Location of 11 rain gaunges according to its longitude, latitude, and altitude for the period from January 1st, 1961 to December 31st, 2004.. 49. TABELA CAPITULO V Tabela 1 – Localização das estações meteorológicas segundo o município, código, longitude, latitude e o período de estudo para o período de 1961 a 2004................................................................................................................ 78. Tabela 2- Coeficientes de regressão polinomial múltipla de 3ª ordem, para a estação de Soure/AM, com valores mais significativos estatisticamente....... 84. Tabela 3- Coeficientes de regressão polinomial múltipla de 3ª ordem, para a estação de Recife/PE, com valores mais significativos estatisticamente........... 88.

(14) xiv LISTA DE SIGLAS. Al - Alumínio Be- Berílio C - Carbono CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior NCC - núcleos condensação de nuvens Cl - Cloro CLMX – Clímax CWT - Continuous Wavelet Transformed D - Agreement Index, E0 - excentricidade da Terra EWD - Easterly Wave Disturbances FAPERN - Fundação de Apoio à Pesquisa do Estado do Rio Grande do Norte GCR - Galactic Cosmic Rays GV - Giga eletrovolts H2O - Água H2SO4 - Ácido Sulfúrico He - Hélio HNO3 - Acido nítrico INMET - Instituto Nacional de Meteorologia ITCZ - Intertropical Convergence Zone Lat -Latitude Long - Longitude MCC - Mesoscale Convective Complex MODWT - Maximal Overlap Discrete Wavelet Transform MRA - Multiresolution Analysis MICE - Multivariate Imputation by Chained Equations MTDSI -Multivariate Time Series Data Imputation NB - Northern region NC-núcleos de condensação (CN) Ne - Neônio NEB - Northeastern (NEB) NRMSE - Normalized Root Mean Square Error.

(15) xv NSE - Coeficiente de Eficiência de Nash-Sutcliffe (NSE) R- Coeficiente de correlação R² - Coeficiente de determinação RMSE - Root Mean Square Error SE - sudeste SIDC - Solar Influences Data Center SILSO - Sunspot Index and Long-term Solar Observations (SILSO) SA - solar activity SSN - Sunspots Number (SSN) TRMM - Tropical Rainfall Measuring Mission ULCV - Upper Level Cyclonic Vortex Wm-2- Watts por metro quadrado.

(16) 16 1. INTRODUÇÃO. Diante da perspectiva do aumento de extremos de precipitação pluvial, devido às mudanças climáticas é de grande importância estudar a influência da atividade ciclo solar e fluxo de raios cósmicos nessa variável meteorológica. A precipitação pluvial é um fator preponderante para a agricultura, setor energético, pecuária e para a economia, desde a escala global à local, além do que a atenuação do volume e a taxa de precipitação na escala de tempo e espaço causam grandes prejuízos à sociedade. Com isso, os raios cósmicos têm significativa importância na ionização de gases presentes na atmosfera, convertendo-os em núcleos de condensação e, consequentemente, favorecendo a formação de nuvens e precipitação. A variação na disponibilidade desses núcleos de condensação derivados na atmosfera influencia na variabilidade de quantidade de nuvens na atmosfera, refletindo na temperatura do ar e na precipitação pluvial. Ainda essa forçante externa (atividade solar e raios cósmicos) associada à variabilidade temporal e espacial térmica, oceânica, pode ser a chave para um melhor entendimento do clima terrestre. A motivação desta pesquisa deve-se à carência de estudos que relacionam fluxo de raios cósmicos e atividade solar com a precipitação pluvial para as regiões Norte e Nordeste do Brasil, partindo-se, com isso, da hipótese de que a variação do fluxo de raios cósmicos e atividade solar influenciam na precipitação pluvial na área de estudo. Os raios cósmicos são partículas extraterrestres de alta energia, provenientes do núcleo galáctico (Via Láctea) e do Sol. Experimentos realizados por Wilson (1900,1901), Elster e Geitel (1901), com auxilio eletroscópios, não souberam explicar as causas da ionização dos gases no interior com o aumento da altitude. Essa pergunta foi respondida por Hess e Kolhörster. Eles mostraram, com auxílio de contador Geiger e de um balão, que a ionização dos gases aumentava com a altitude, dando o nome de “raios cósmicos”, pois não eram provenientes da superfície terrestre. Os raios cósmicos galácticos são fortemente atenuados com a atividade solar e com ciclo de aproximadamente onze anos. Isso evidencia que existe a dependência da intensidade do fluxo de raios cósmicos com os parâmetros de magnético do Sol (SOLANKI et al., 2000; BELOV et. al, 2002; McCRACKEN ,2004). Belov et al. (2002) encontraram uma relação semi-empírica entre a intensidade dos raios cósmicos e os parâmetros magnéticos solares. Alanko-Huotari et al. (2006) estudaram as relações empíricas entre a modulação de raios cósmicos galácticos e os parâmetros.

(17) 17 heliosféricos globais. Os resultados mostraram que a combinação desses parâmetros explica a maioria das modulações dos raios cósmicos. Solanki et al. (2000) mostraram a existência dos ciclos de manchas solares com as variações seculares. Baseando-se no fluxo magnético solar, mancha solar e concentração de 10. Berílio (10Be) provenientes do gelo polar, pode-se estimar o fluxo magnético do Sol.. Segundo McCracken (2004), o. 10. Be é um dos cosmogênicos mais sensíveis à modulação de. raios cósmicos em altas latitudes, produzindo cerca de 30% a 49% do deposição. 10. Be observado. A sua. 10. Be é anti-correlacionada com a atividade solar, com periodicidade de onze anos e. correlacionada com os raios cósmicos (BERGGREN et al., 2009). Os raios cómicos, ao interagir com alguns gases presentes na atmosfera terrestre, ionizam, produzindo radionuclídeos cosmogênicos. 3. He,. 10. Be,. 14. C,. 21. Ne,. 28. Al e o. 36. Cl. (GOSSE e PHILLIPS, 2001). As concentrações desses nuclídeos cosmogênicos, na troposfera, dependem do fluxo de raios cósmicos, circulação atmosférica, altitude das glaciações (Staiger et al., 2007). Além desses, contribuem para a formação e remoção dos fotoxidantes NOx, HOx, O3, e de importantes gases, traços importantes para a produção de núcleos de condensação, como H2SO4, HNO3, NHO3 (CALISTO et al., 2011; KIRKBY et al., 2011). Ainda segundo (KIRKBY et al., 2011), a amônia na atmosfera tem a capacidade de aumentar a velocidade de nucleação de partículas de ácido sulfúrico de 100 a 1.000 vezes a nucleação binária de H2SO4-H2O na média troposfera, contudo insignificante na camada limite. Spracklen et al. (2008), utilizando-se de um modelo de microfísica de aerossóis globais para prever a contribuição da camada limite (BL) de formação de partículas, há distribuições regionais e globais de condensação de nuvens núcleos (CCN). Constataram que a taxa de formação de CCN é proporcional ao ácido sulfúrico em fase gasosa, aumentando-se média global (0,2% supersaturação) e concentrações de 3-20% e CCN (1%) para 5-50%. Esses CCN são importantes partículas higroscópicas na formação de nuvens e de precipitação pluvial. Essa relação entre a nebulosidade da Terra e os raios cósmicos foi encontrada por Svensmark e Friis-Cristensen (1997). Com a utilização do banco de dados dos satélites, mostraram que entre o mínimo e o máximo solar, a nebulosidade variou de 3% a 4%. Staiger et al. (2007) evidencia que absorção de raios cósmicos pela massa atmosférica varia temporalmente devido a uma redistribuição da pressão atmosférica por camadas de gelo durante as glaciações, sendo que esses processos atmosféricos alteram as taxas de produção de nuclídeos cosmogênicos..

(18) 18 Kniveton e Todd (2001) encontraram evidências de uma forte relação estatisticamente entre o fluxo de raios cósmicos, precipitação pluvial e eficiência de precipitação sobre superfícies oceânicas em médias e altas latitudes em torno de 4-7% no ciclo solar da década de 1980. No trabalho de Koren et al. (2012), evidencia-se que os aerossóis são extremamente importantes para o clima e para o ciclo hidrológico. Baseando-se nas taxas de precipitação provenientes do TRMM (Tropical Rainfall Measuring Mission, TRMM) e meteorológico, constatou-se que aumento na abundância de aerossol reflete na intensificação local das taxas de chuva detectadas pelo TRMM e um aumento na altura de nuvem superior. A relação é evidente tanto sobre o oceano e a terra, quanto nos trópicos, subtrópicos e latitudes médias. Com isso, o Capítulo I dessa Tese foi destinado para estudo inicial das séries temporais de raios cósmicos galácticos. O Capítulo II foi costruido para determinar as periodicidades dos raios cósmicos e da atividade solar. O Capítulo III destinou-se para investigar a relação da precipitação pluvial com as manchas solares e raios cósmicos galácticos. Capítulo IV tende a responder a relação entre a precipitação pluvial com o número de manchas solares e raios cósmicos galácticos. E o último capítulo foi utilizado para criaçãode um modelo exógeno para precipitação pluvial utilizando variáveis exógenas como anomalias oceâncias, estimativa da radiação no topo da atmosfera, número de manchas solares e raios cósmicos galácticos. Portanto, o objetivo dessa pesquisa foi de estabelecer uma relação espaço-temporal entre o fluxo de raios cósmicos, atividade solar entre outras variáveis na precipitação pluvial da região Norte e Nordeste do Brasil..

(19) 19 CAPITULO I. Este primeiro capítulo refere-se ao artigo intitulado como “Data Imputation Analysis for Cosmic Rays Time Series”, publicado na revista Advances in Space Research. Esta revista está classificada pelos indicadores da CAPES Qualis B2 na área GEOCIÊNCIAS e B1 em ENGENHARIAS III, com fator de impacto de 1, 409. DOI: <https://doi.org/10.1016/j.asr.2017.02.022>. Data Imputation Analysis for Cosmic Rays Time Series R. C. Fernandesa*, P.S. Luciob, J.H. Fernandezb a. Programa de Pós-Graduação em Ciências Climáticas, Universidade Federal do Rio Grande do Norte, Natal/RN, 59078970, Brasil, ronabson@hotmail.com. b. Departamento de Ciências Atmosféricas e Climáticas, Universidade Federal do Rio. Grande do Norte, Natal/RN, 59078970, Brasil, pslucio@ccet.ufrn.br ; jhenrix@gmail.com. Abstract The occurrence of missing data concerning Galactic Cosmic Rays time series (GCR) is inevitable since loss of data is due to mechanical and human failure or technical problems and different periods of operation of GCR stations. The aim of this study was to perform multiple dataset imputation in order to depict the observational dataset. The study has used the monthly time series of GCR Climax (CLMX) and Roma (ROME) from 1960 to 2004 to simulate scenarios of 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80% and 90% of missing data compared to observed ROME series, with 50 replicates. Then, the CLMX station as a proxy for allocation of these scenarios was used. Three different methods for monthly dataset imputation were selected: AMÉLIA II - runs the bootstrap Expectation Maximization algorithm, MICE - runs an algorithm via Multivariate Imputation by Chained Equations and MTSDI - an Expectation Maximization algorithm-based method for imputation of missing values in multivariate normal time series. The synthetic time series compared with the observed ROME series has also been evaluated using several skill measures as such as RMSE, NRMSE, Agreement Index, R, R², F-test and t-test. The results showed that for CLMX and ROME, the R² and R statistics were equal to 0.98 and 0.96, respectively. It was observed that increases in the number of gaps generate loss of quality of the time series. Data imputation was more efficient with MTSDI method, with negligible errors and best skill.

(20) 20 coefficients. The results suggest a limit of about 60% of missing data for imputation, for monthly averages, no more than this. It is noteworthy that CLMX, ROME and KIEL stations present no missing data in the target period. This methodology allowed reconstructing 43 time series.. Keywords: Bootstrap, Expectation Maximization, skill, multivariate, chained equations. 1. Introduction. A major problem in the study of the Galactic Cosmic Rays (GCR) time series is the difficulty in finding a non-gapped long-term series. Data losses can be caused by mechanical or technical failure and human errors. Thus, several GCR studies are restricted to few stations distributed around the globe. This data missing problem is not a GCR time series privilege, but can also be observed in several other areas, like Meteorology, Biomedicine, Information Systems datasets, among others. Over the past decades, the historical GCR time series has been reconstructed using sunspot numbers and cosmogenic 10Be isotope levels found in both Earth Polar Caps (Usoskin et al., 2002; Mursula et al., 2003; McCracken, 2004; Usoskin et al., 2005; McCracken and Beer, 2007). However, it leads to some questions, such as: (a) Is it possible to create a synthetic series from another Neutron Monitor (NM) station? (b) What is the criterion for filling data gaps? and (c) Which GCR stations would be filled? Therefore, the main aim of this study was to report the use of the multiple imputation method in order to analyze its efficiency on the reconstruction of observational GCR data.. 2. Material and Methods. 2.1. Dataset GCR. monthly. databases. from. the. Russian. Academy. of. Sciences. (http://www.wdcb.ru/stp/) and from the Geophysical World Data Center (GWDC) for SolarTerrestrial. Physics,. for. the. 1960-2004. (http://www.wdcb.ru/stp/data/cosmic.ray/Neutron_Monitors(monthly_values)/). period were used.. The spatial distribution of stations is shown in Fig.1a. It was observed in Fig.1b the Climax station (CLMX, Lat = 39.37°, Long = -106.1°, Alt = 3.400m, Cut-Off Rigidity 2.99 GV, 17.

(21) 21 NM64) and Rome (ROME, Lat . = 41.9°, Long. = 12.52°, Alt. 60m, Cut-Off Rigidity 6.32 GV).. Fig. 1. (a): GCR NM stations spatial distribution in the globe, over the Mollweide projection and (b) Climax (CLMX) and Rome (ROME) GCR NM locations, according to the Azequalarea projection.. 2.2. Simulation and Assessment For this study, two GCR stations, CLMX and ROME, were selected due to their complete time series for the chosen period (from 1960 to 2004). CLMX was adopted for being the reference station and ROME as the simulated one. Overall, 9 different scenarios for ROME were prepared, randomly drawing 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80% and 90% of observational data (N = 540 months). For each scenario, 50 replicates were performed. A feature in the R program that performs a random "draw" based on the total length of the series and the desired amount of missing data was used. A series has length of 540 months, in this case, there was a draw of 54 (10%) data by replacing these 54 data observed by NA and later, imputations were performed. This procedure was performed for the remaining percentages. The initial simulation tried to faithfully represent the several observed.

(22) 22 series that have missing data. Thus, MTSDI has achieved imputations both with continuous series of missing data as randomly with less error. Then, the R packages AMÉLIA II (Honaker et al., 2011a), MICE (Multivariate Imputation by Chained Equations) and the MTDSI (Multivariate Time Series Data Imputation) for missing data imputation were used. The CLMX time series served as a proxy for allocation of ROME scenarios. Therefore, several synthetic series for ROME were created, and each scenario was compared with the original ROME series.. 2.2.1 AMÉLIA II. The AMÉLIA II software (Honaker et al, 2011a) uses an expectation maximization (EM) algorithm based on bootstrap methodology, considered fast and robust (Dempster et al.,1977; McLachlan and Krishnan, 1997; Horton and Kleinman, 2007; Honaker et al., 2011a, 2017b). The bootstrap method processes a sample as a "pseudo-population", randomly generating other data sets with the same size of the original series (Kline, 2015). This distribution has 95% confidence interval, with sample replication of the average distribution on bootstrapping samples (Schmidheiny, 2012). In this study, 100 replicates were used. For imputation of missing data, an imputation value was randomly selected, based on the series distribution and on the number of generated replications.. 2.2.2 MICE The MICE method is based on chained equations (Van Buuren et al., 2006; Van Buuren and Groothuis-Oudshoorn, 2011). Synthetic data allocation was carried out separately for each variable using other variables as predictors (Kline, 2015). At each step of the algorithm, the imputation of a given missing data occurs according to the predictor variable (Kline, 2015). This process is continuously repeated to input missing data using the Gibbs sampling procedure until the process reaches convergence, as defined by Kline (2015). For the GCR flux imputation, as a continuous variable, linear regression model was used.. 2.2.3 MTSDI The MTSDI method (Junger and Leon, 2003a, 2012b) uses the EM algorithm with the Autoregressive Integrated Moving Average (ARIMA) method, also known as Box-Jenkins model (Box and Jenkins, 1976; Meyler, 1998). The data provided by ARIMA (p, d, q) depend.

(23) 23 on the number of autoregressive terms (p), the number of differences (d), and the number of terms in the moving average (q) (Meyler, 1998). Default configuration was used.. 2.3. Evaluation of imputation methods Relying on the goodness-of-fit functions for comparison of simulated and observed hydrological time series, available in the hydroGOF package (Zambrano-Bigiarini, 2014), the performance of data imputations was quantitatively evaluated by comparing synthetic and observed series. Thus, it was calculated: The Root Mean Square Error (RMSE) is the standard deviation of the prediction error model. Lower value indicates better model performance (1). 1. 2 𝑅𝑀𝑆𝐸 = √𝑁 ∑𝑁 𝑖=1(𝑆𝑖 − 𝑂𝑖 ). . (1). The Normalized Root Mean Square Error (NRMSE) is the relative sample standard deviation of the differences between predicted (S i) and observed (Oi) values, given in percentage (Equation 2). 𝑁𝑅𝑀𝑆𝐸 = 100. 1 𝑁. 2 √ ∑𝑁 𝑖=1(𝑆𝑖 −𝑂𝑖 ). 𝑛𝑣𝑎𝑙. (2). 𝑠𝑑(𝑂𝑖 ), 𝑛𝑜𝑟𝑚 = "𝑠𝑑" } Where: 𝑛𝑣𝑎𝑙 = { 𝑂𝑖 𝑚𝑎𝑥 − 𝑂𝑖 𝑚𝑖𝑛, 𝑛𝑜𝑟𝑚 = "𝑚𝑎𝑥𝑚𝑖𝑛" . The Nash-Sutcliffe Efficiency coefficient (NSE) determines the relative magnitude of the residual variance, compared to the data variance measurement (Nash and Sutcliffe, 1970). NSE can range from negative infinity to 1, with 1 indicating perfect fit (Equation 3): ∑𝑁 (𝑆 −𝑂 )2. 𝑖 𝑖 𝑁𝑆𝐸 = 1 − ∑𝑖=1 𝑁 (𝑂 −𝑂 ̅ )2 𝑖=1. . 𝑖. (3). The Agreement Index (d) is a standard measure of the prediction error model ranging from 0 to 1 (Willmott, 1981), where 1 indicates perfect match and 0 indicates no agreement at all. This index is sensitive to extreme values due to squared differences (Legates and McCabe, 1999) (Equation 4). 𝑑 = 1 − ∑𝑁. 2 ∑𝑁 𝑖=1(𝑂𝑖 −𝑆𝑖 ) (|𝑆𝑖 −𝑂̅|+|𝑂𝑖 −𝑂̅|)2. (4). 𝑖=1. . The Linear Correlation Coefficient (R) is the covariance (s) between O i (observed) and Si (predicted), standard deviations of observed and simulated data, respectively. This coefficient is given by (Equation 5):.

(24) 24 𝑆𝑜𝑖 𝑠𝑖. 𝑅=𝑆. 𝑜𝑖 𝑆𝑠𝑖. . (5). The Determination Coefficient (R²) is given by (Equation 6): 𝑅² =. ̅ ∑𝑛 𝑖=1(𝑆𝑖− 𝑂𝑖 )² 𝑛 ∑ 𝑂𝑖 −𝑂̅. (6). 𝑖=1. Subsequently, the F-test between observed and simulated ROME series was applied, 2 2 2 2 with significance level α = 0.05, where: 𝐻0 : 𝜎𝑅𝑂𝑀𝐸 = 𝜎𝑆𝐼𝑀 and 𝐻1 : 𝜎𝑅𝑂𝑀𝐸 ≠ 𝜎𝑆𝐼𝑀 ;. Then, the t-test was also applied between observed and simulated ROME series, with significance level α = 0.05, where: 𝐻0 : 𝜇𝑅𝑂𝑀𝐸 = 𝜇𝑆𝐼𝑀 and 𝐻1 : 𝜇𝑅𝑂𝑀𝐸 ≠ 𝜇𝑆𝐼𝑀 ; To perform the calculations of coefficients and to obtain the time series, open source free software, the R Project for Statistical Computing, version 3.1.2 (R, 2014) was used. Finally, for plotting the graphics and making the figures, the Lattice Software was used (Sarkar, 2008).. 4. Results and discussion Fig.2a shows CLMX and ROME time series with similar profiles, differing in the intensity of GCR flows. The correlation between these two series corresponds to R = 0.98 and R² = 0.96 (Fig. 2b). It is well known that GCR is modulated by the magnetic field of the Sun, Earth's magnetic rigidity, geographical position and altitude (Usoskin et al., 2005; Zhou et al., 2006; Herbst et al., 2013; Ahluwalia, 2014). These modulations correspond to time periods ranging from 11 years to secular variations (Solanki et al., 2000; Berggren et al., 2009).. Fig. 2. GCR intensity (a) Climax (in blue) and Rome (in magenta) GCR monthly observed time series and (b) their correlation for the period from January 1 st, 1960 to December 31st, 2004..

(25) 25 It was observed that the increase in the imputed data percentage (beginning with 10% up to 90% of imputation, in 10% steps) is negatively reflected in the synthetic series, with gradual loss of efficiency in the respective scenarios (1-9). Analyzing the d index (Willmott, 1981), which ranges from 0 to 1, from the worst to the best model, respectively, it was found that MTDSI (0.999 ± 0.001 ≥ dMTSDI ≥ 0.977 ± 0.03) showed similar behavior to MICE (0.998 ± 0.001 ≥dMICE ≥ 0.965 ± 0.009), and MTDSI obtained the best performance. AMÉLIA II (0.953 ≥ dAMELIA ≥ 0.475) (Fig.3a) provided the most discrepant results, being considered the worst model for these imputations. It could also be observed in Fig.3b that the best correlations for the different scenarios were observed with MTDSI (0.998 ± 0.001 ≥ RMTSDI ≥ 0.976 ± 0.013) and. MICE. (0.996 ±. 0.001≥RMICE≥0.965 ± 0.009). AMÉLIA II (0.908≥ RAMELIA ≥ 0.12) has significantly corrupted the series, showing the worst correlations. Fig.3c shows MTDSI (0.997 ± 0.001 ≥ R²MTSDI ≥ 0.953 ± 0.026) and MICE (0.992 ± 0.003 ≥R²MICE≥ 0.88 ± 0.023) models, correlation indexes close to 1, being MTSDI the model with the best fit. Once again, AMÉLIA II (0.825≥ R²AMELIA ≥ 0.016) proved to be the worst model, showing significant correlation losses when increasing the amount of imputations. Fig.3d shows the NSE coefficients for scenarios 1 to 9 (from 10% to 90% of missing data, respectively). Both NSEMICE (0.992 ± 0.003≥ NSEMICE ≥0.876 ± 0.026) and NSEMTSDI (0.997 ± 0.001≥ NSEMTSDI ≥ 0.930 ± 0.072) were satisfactory; however, NSE MTSDI efficiency was greater than NSEMICE. Again, Amelia was the most discrepant model for these simulations, obtaining the lowest efficiency results.. Figure 3 –Continue..

(26) 26. Fig.3 - Average coefficients (a) d, (b) R (c) R² and (d) NSE for the respective imputations percentage, calculated by AMÉLIA II (blue), MICE (green) and MTSDI (red) models.. It was observed (Fig.4) that NMSEMTSDI and NMSEMICE were lower than 24.23 ± 10.64% and 34.978 ± 3.588%, respectively (in all scenarios). Imputations made by AMÉLIA II method did not produce satisfactory results, with NMSE AMELIA yielding 42.798% for scenario 1 (10% of imputations) and 142.022% for scenario 9 (90% of imputations).. Fig.4 - Average NRMSE coefficients for the respective imputations scenarios in the AMÉLIA II (blue), MICE (green) and MTSDI (red) models.. The resultant series with imputations made by AMÉLIA II (Fig.5a) did not represent the expected behavior of the ROME time series, for all the different scenarios, justifying the poor previous coefficients reported. MICE (Fig.5b) and MTSDI (Fig.5c) showed good adjustments in comparison with the real series (also in accordance with previous indexes obtained), being both capable of reproducing the expected ROME time series profile, although the MTDSI model had provided the best fits..

(27) 27 The adopted AMÉLIA II methodology failed to satisfactorily reproduce the behavior of the observed series. AMÉLIA II is based on the bootstrapping method, and this method replicates a series in 100 times and randomly, and the imputation of missing data is in accordance with the sample distribution. It was observed that AMÉLIA II had poor performance, while MICE and MTSDI best represented the observed data. The AMÉLIA II package considers that all variables in a dataset have multivariate normal distribution (MVN), using mean and covariance to summarize data. The imputation is carried out randomly, so, it failed to represent the observed GCR data. The MICE assumes the probability that a given missing data will only depend on the observed value and can be expected to use them. Using a linear regression combining their results and thus making the imputation. MTSDI based on the EM algorithm assumes the temporal correlation structure of GCR and is modeled with the aid of AutoRegressive models, Integrated and Moving Averages, ARIMA (p, d, q). Thus, MTSDI could better detect the behavior patterns of the GCR time series.. Fig. 5- Comparison between the original ROME Station GCR Time series and 1-9 missing data scenarios reconstructed series, for the period from January 1960 to December 2004, as obtained using the (a) AMÉLIA II, (b) MICE and (c) MTSDI models.. Would it be possible for the Roma station to serve as a proxy for imputation of missing CLIMX data? Yes! The procedures previously used have been redone, simulating the.

(28) 28 CLIMX series. It was observed that the results were similar for AMÉLIA II (Fig.6a), MICE (Fig.6b) and MTSDI (Fig.6c), with MTSDI presenting the best result.. Fig. 6- Comparison between the original CLMX Station GCR Time series and 1-9 missing data scenarios reconstructed series, for the period from January 1960 to December 2004, as obtained using the (a) AMÉLIA II, (b) MICE and (c) MTSDI models.. An extrapolation at 5% significance level threshold already from the third simulation scenario (30% of missing data) was observed for AMÉLIA II (Fig.7a), which only occurred from the eighth scenario (80% of missing data) for both MICE (Fig.7b) and (60% of missing data) MTSDI (Fig.7c) models. Below the observed thresholds (p> 0.05), the reconstructed series do not significantly differ from the original one, leading to the acceptance of the equality of variance hypothesis..

(29) 29. Fig.7. F-test for ROME Station using (a) AMÉLIA II, (b) MICE and (c) MTSDI models, the percentage of missing data (scenario) and repetitions.. An extrapolation at 5% significance level threshold already from the third simulation scenario (30% of missing data) was observed for AMÉLIA II (Fig.8a), and from the eighth scenario (80% of missing data) for MICE (Fig.8b) and sixth scenario (60% of missing data) for MTSDI models (Fig.8c). Again, values below these limits do not significantly differ from the original ROME series, accepting the average equality hypothesis. Data imputation performed by MTSDI method was more efficient, yielding p-values close to 1.. Fig.8 - Continue.

(30) 30. Fig.8. T-test for ROME Station using (a) AMÉLIA II, (b) MICE and (c) MTSDI models, the percentage of missing data (scenario) and repetitions.. According to the F-test, the extrapolation of the confidence level occurred in the percentage of 60% of missing data, thus suggesting this limit for imputation, valid for GCR time series with monthly averages. The results showed that in the percentage of 60% of missing data the mean NRMSE coefficients, d, R and R² were equal to 14.598 ± 4.662%, 0.994 ± 0.01, 0.989 ± 0.012 and 0.977 ± 0.023, respectively. Several studies have used several percentages such as 33% (Rodwell et al., 2014), 20% (Nunes et al., 2009), 27.1% (Nunes et al., 2010). Considering the time series from 1960 to 2004 equal to 100%, stations that had 60% of missing data in this interval were selected, even when operating in different periods. It was observed in Fig.9a that all GRC stations around the world that had up to 60% of missing data resulted in a total of 43 stations for the period analyzed here (1960-2004). In the same period, CLMX, KIEL and ROME stations showed no missing data at all (0% of missing data). JUNG and JUNG1 stations had 0.4% (2 months) and 59.8% (323 months) of missing data, respectively. It was observed in Fig.9a that all the time series plotted present similar behavior. After the imputation process, various synthetic series were created, as shown in Fig.9b, displayed at the right side of the observed data plot for visual comparison proposals. The efficiency of imputations with cross correlation was verified..

(31) 31. Fig.9. GCR time series (a) originals and (b) with imputations.. 5. Summary and Final Remarks. Given the importance of GCR time series, both missing data problems and the criteria for their imputation have been investigated. The results were satisfactory, leading to the possibility of creating reliable synthetic series from different imputation methods, namely, MICE and MTSDI. Unfortunately, for this study, the AMÉLIA II method proved to be inefficient compared to the others. For the imputation quality verification, the results obtained with the "goodness-of-fit" algorithm were compared, showing that these indexes can be used.

(32) 32 to compare the models applied. The present analysis suggests that up to 60% of missing data in a time series is acceptable for creating a reliable synthetic series, according to methodology adopted, since gaps are randomly distributed over the series. Following the criteria adopted, 43 GCR stations series were completed by imputation, with the production of various synthetic series. Based on these results, this study suggests and recommends the use of imputation methods to complete gapped temporal GCR series.. Acknowledgments To the Brazilian Coordination for the Improvement of Higher Education Personnel (CAPES) and to the Research Support Foundation of Rio Grande do Norte State (FAPERN) for granting doctoral fellowship to the author. Paulo S. Lucio is sponsored by a PQ2 grant (Proc. 307988/2013-9) from CNPq (Brazil). The author thanks at George U. Pedra by contributions for this article.. References. Ahluwalia, H. S. Sunspot activity and cosmic ray modulation at 1a. u. for 1900–2013. Advances in Space Research, 54(8), 1704-1716, 2014.. Berggren, A.M., Beer, J., Possnert, G., Aldahan, A., Kubik, P., Christl, M., Johnsen, S. J., Abreu, J., Vinther,. B. M. A 600-year annual. 10. Be record from the NGRIP ice core,. Greenland. Geophys. Res. Lett., 36(11), L11801, doi: 10.1029/2009GL038004, 2009. Box, G. E. P., Jenkin, G. M. Time Series Analysis: Forecasting and Control, 2nd ed., San Francisco, Holden-Day, 1976.. Dempster, A. P., Laird, N. M., Rubin, D. B. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39, 138, 1977.. Herbst, K., Kopp, A., Heber, B. Influence of the terrestrial magnetic field geometry on the cutoff rigidity of cosmic ray particles. Ann. Geophys., 31, 1637–1643, 2013. Honaker, J., King, G., Blackwell, M. Amelia II: A Program for Missing Data. Journal of Statistical Software, 45(7), 1-47, 2011a..

(33) 33 Honaker, J., King, G., Blackwell, M. Amelia II: A Program for Missing Data. http://gking.harvard.edu/amelia, 2017b. Horton N.J., Kleinman, K.P. Much ado about nothing: A comparison of missing data methods and software to fit incomplete data regression models. The American Statistician. 61(1), 7990, 2007. Junger, W., Leon, A.P., Santos, N. Missing data imputation in multivariate time series via EM Algorithm. Cadernos do IME, 15, 8–21, 2003a. Junger, W., Leon, A. P. mtsdi: Multivariate time series data imputation. R package 0.3, 3, https://CRAN.R-project.org/package=mtsdi , 2012b.. Kline, R.B. Boostrapping, in: Principles and Practice of Structural Equation Modeling. New York, NY, Guilford Press, 4th Ed., pp.60-61, 2015. Legates, D.R., McCabe, G.J. Evaluating the use of “goodness‐of‐fit” measures in hydrologic and hydroclimatic model validation. Water Resources Research, 35(1), 233-241, 1999.. McCracken, K. G. Geomagnetic and atmospheric effects upon the cosmogenic. 10. Be observed. in polar ice. Journal of Geophysical Research, 109, A04101, doi:10.1029/2003JA010060, 2004.. McCracken, K. G., Beer, J. Long term changes in the cosmic ray intensity at Earth, 1428– 2005, Journal of Geophysical Research: Space Physics, 112, doi:10.1029/2006JA012117, 2007.. McLachlan, G. J., Krishnan,T. The EM algorithm and extensions. John Wiley and Sons, New York, NY, 1997.. Meyler, A., Kenny, G., Quinn, T. Forecasting irish inflation using ARIMA models, in: Central Bank and Financial Services Authority of Ireland Technical Paper Series. Munich, Germany, Nº. 3/RT/98(December 1998), 1998, 1-48, 1998..

(34) 34 Mursula, K., Usoskin, I. G., Kovaltsov, G. A. Reconstructing the long-term cosmic ray intensity: Linear relations do not work. Ann. Geophys., 21(4), 863-867, 2003.. Nash, J. E., Sutcliffe, J.V. River flow forecasting through conceptual models part I-A discussion of principles. Journal of hydrology, 10(3), 282-290, 1970.. Nunes, L.N., Klück, M. M., Fachel, J.M.G. Uso da imputação múltipla de dados faltantes: uma simulação utilizando dados epidemiológicos. Cadernos de Saúde Pública (Reports in Public Health). Rio de Janeiro, 25(2), 268-278, 2009. (in Portuguese).. Nunes, L.N., Klück, M.M., Fachel, J.M.G. Comparação de métodos de imputação única e múltipla usando como exemplo um modelo de risco paramortalidade cirúrgica. Rev. Bras Epidemiol., 13(4), 596-606, 2010. (in Portuguese).. R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN: 3-900051-07-0, 2014.. Rodwell L., Lee K.J., Romaniuk, H., Carlin J.B. Comparison of methods for imputing limited-range variables: a simulation study. BMC Medical Research Methodology, 14, doi: 10.1186/1471-2288-14-57, 2014.. Sarkar, D. Lattice: Multivariate Data Visualization with R. Springer-Verlag, New York, NY, ISBN: 978-0-387-75968-5, 2008.. Schmidheiny, K. The bootstrap, in: Short Guides to Microeconometrics. Spring, Basel, Switzerland: Universität Basel, 1–10, http://kurt.schmidheiny.name/teaching/bootstrap2up.pdf , 2012.. Solanki, S. K., Schüssler, M., Fligge, M. Evolution of the Sun's large-scale magnetic field since the Maunder minimum. Nature, 408, 445-447, 2000.. Usoskin, I. G., Mursula, K., Solanki, S. K., Schüssler, M., Kovaltsov, G. A. A physical reconstruction of cosmic ray intensity since 1610. Journal of Geophysical Research: Space Physics, 107(A11), doi:10.1029/2002JA009343, 2002..

(35) 35. Usoskin, I. G., Alanko‐Huotari, K., Kovaltsov, G. A., Mursula, K. Heliospheric modulation of cosmic rays: Monthly reconstruction for 1951–2004. Journal of Geophysical Research: Space Physics, 110(A12), doi: 10.1029/2005JA011250, 2005.. Van Buuren, S., Brand, J. P., Groothuis-Oudshoorn, C. G. M., Rubin, D. B. Fully conditional specification in multivariate imputation. Journal of Statistical Computation and Simulation, 76(12), 1049-1064, 2006.. Van Buuren, S., Groothuis-Oudshoorn, K. MICE: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45(3), doi: 10.18637/jss.v045.i03, 2011.. Zambrano-Bigiarini, M. hydroGOF: Goodness-of-fit functions for comparison of simulated and observed hydrological time. series. R package version 0.3-8, http://CRAN.R-. project.org/package=hydroGOF, 2014. Zhou, D., O’Sullivan, D., Semones, E., Heinrich, W. Radiation field of cosmic rays measured in low Earth orbit by CR-39 detectors. Advances in Space Research, 37(9), 1764-1769, 2006. Willmott, C. J. On the validation of models. Physical Geography, 2(2), 184-194, 1981..

(36) 36 CAPÍTULO II. Este capítulo refere-se ao artigo intitulado como “Periodic determination of the galactic cosmic rays with application of The Maximal Overlap Discrete Wavelet Transform (MODWT)”, para submissão. Esta revista está classificada pelos indicadores da CAPES Qualis B1 na área GEOCIÊNCIAS, com fator de impacto de 1,326.. Periodic determination of the galactic cosmic rays with application of the Maximal Overlap Discrete Wavelet transform (MODWT) Ronabson C. Fernandesa*, Paulo S. Luciob, José H. Fernandezb a. Programa de Pós-Graduação em Ciências Climáticas, Universidade Federal do Rio Grande. do Norte, Natal/RN, 59078970, Brasil, ronabson@hotmail.com b. Departamento de Ciências Atmosféricas e Climáticas, Universidade Federal do Rio Grande. do Norte, Natal/RN, 59078970, Brasil, pslucio@ccet.ufrn.br ; jhenrix@gmail.com Corresponding author. Tel.:+55-84-3342-2479. Abstract Galactic cosmic rays (GCR) are important for various physical and chemical processes of the Earth's atmosphere and their attenuation brings about significant changes in these factors. With this, it was verified if there would be change in the periodicity with change of the spatial position of the GCR stations. The Maximal Overlap Discrete Wavelet Transform (MODWT) technique was used based on the Haar Wavelet (WH) and the GCR time series of Kiel, Rome, Climax and Huancayo. The results showed that the spatial distribution contributed to the energy percentage increase of 5.3 years from higher to lower latitude and considerably attenuating the periodicities of 10.6 years and 22.3 years.. Keywords: wavelet haar, periodicity, changes.. 1. Introduction It is known that the intensity of Galactic Cosmic Rays (GCR) is strongly attenuated due to solar activity, solar and terrestrial magnetic field (Lockwood and Webber, 1967; Solanki et al., 2000, and McCracken, 2004) . Studies have shown that the GCR periodicity is.

(37) 37 approximately 11 years (Lockwood and Webber, 1967). In short- to medium-term periodicities the flux of GCR, solar magnetic flux and other terrestrial and interplanetary phenomena are correlated and strongly attenuated in the maximum solar activities (ValdésGalicia, 2005; Chowdhury and Ray, 2010.) and the duration of these events are important for terrestrial nebulosity, cosmogenic aerosol production, rainfall and atmospheric circulation (Svensmark and Friis-Cristensen, 1997, Lucio, 2005, Zarrouk and Bennaceur, 2010, Calisto et al., 2011, Kirkby et al., 2011, Veretenenko and Ogurtsov, 2012). Several terrestrial and interplanetary factors modulate the intensity of GCR. So, the hypothesis of this study is to verify if the geographic position change of some GCR stations changes in their periodicities. Thus, the objective of this study was to verify the distribution of periodicities to different NM stations distributed throughout the terrestrial globe.. 2. Material and Methods 2.1. Dataset We used the monthly series of cosmic rays of the Russian Academy of Sciences, Geophysical Center World Data Center (WDC) for Solar-Terrestrial Physics, monthly, from 1960 to 2004 (http://www.wdcb.ru/stp/data/cosmic.ray/Neutron_Monitors(monthly_values)/). The location of the Climax NM stations (CLMX, Lat = 39.37 °, Long = - 106.1 °, Alt = 3.400m, Cutt-Off Rigidity 2.99 GV, 17 NM64), Rome (ROME , Latt = 41.9°, Long = 12.52°, 60m H, Cutt-Off Rigidity 6.32 GV), Kiel (Lat = 54.30°, Long = 10.10°, Ht = 54m, Cutt-Off Rigidity = 2.29GV) and Huancayo (Lat = -12.03°, Long = -75.33°, Alt = 3.400m, Cutt-Off Rigidity = 13.45GV). All NM stations were with the complete monthly time series, with the exception of Huancayo, which was imputed according to Fernandes et al. (2017).

(38) 38. Figure 1. Spatial position of NM stations (Kiel, Rome, Climax and Huancayo). The sunspot number (SSN) historical series was obtained from the website of the Solar Influences Data Center (SIDC), the Long Seas Solar Index and Solar Observations Index (SILSO) of the Royal Observatory of Belgium, Brussels (<http://www.sidc.be/silso/home>). And the monthly averages of the solar source radio flux of wavelength of 10.7 centimeters (F10.7), were obtained from website of the Natural Resources Canada (NRCAN) <ftp://ftp.geolab.nrcan.gc.ca/data/solar_flux/monthly_averages/solflux_monthly_average.txt>. 2.2. Maximal overlap discrete wavelet transform (MODWT). The absolute deviation for each time series of GCR was determined by leaving the data stationary. The Maximal Overlap Discrete Wavelet Transform (MODWT) was applied for each time series, based on Haar Wavelet (HW),, was decomposed into 9 crystals (from d1 to d9). The percentage of energy distributed in each crystal of each decomposed series was also calculated. Each crystal represents the periodicity of 2n, where n (1,2,3, .. 9) corresponds to the number of crystals. Thus, 21, 22, 23, .. 29, corresponds to 2 months, 4 months, 8 months, ...., 512 months (42.6 years) The MODWT corresponds to the decomposition of the Multiresolution Analysis (MRA) of the time series. The ARM is additive to the periodic scale of the decomposition..

(39) 39 This technique decomposes a time series X (t) into a more rudimentary addition of a time series, named Smooths (𝑆𝐽 ) and Detail (𝐷𝑗 ). The 𝐷𝑗 corresponds to the changes in the 𝐽 − 1 scale and contains the 𝑆𝐽 ) average level in the 𝐽 − 1 scale (1). 𝐽−1. (1). 𝑋 = ∑ 𝐷𝑗 + 𝑆𝐽 𝑗=1. The MODWT is obtained by equation (2), where ℎ̃𝑗,𝑙 is the MODWT wavelet filter 𝑔̃𝑗,𝑙 corresponds to the scaling filter, where l = 1..L is the length of the filter at the level of decomposition j (2). ℎ𝑗,𝑙 = ℎ̃𝑙 /2𝑗/2. (2). 𝑔𝑗,𝑙 = 𝑔̃𝑙 , 𝑙/2𝑗/2. Thus, the coefficients of the MODWT wavelet of level j can be defined as the convolution of the time series and the MODWT filters (3) 𝐿−1. 𝑊𝑗,𝑡 = ∑. 𝑙=0. ℎ𝑗,𝑙 𝑋𝑡−𝑙𝑚𝑜𝑑𝑁. (3). 𝐿−1. 𝑉𝑗,𝑡 = ∑. 𝑙=0. 𝑔𝑗,𝑙 𝑋𝑡−𝑙𝑚𝑜𝑑𝑁. The coefficients of the MODWT wavelet of equation (3) coincide with the original signal length X, which is converted to matrix, being expressed by equation (4): ⃗⃗⃗ 𝑗 = 𝑊 ̿𝑗 𝑋 𝑊. (4). ⃗𝑗 = 𝑊 ̿𝑗 𝑋 𝑉. Finally, the original series X can be defined according to Smooths and Details (5). 𝑇. ⃗𝑗 = 𝑊 ̿𝑗 𝑊 ⃗⃗⃗ 𝑗 𝐷. (5). 𝑇. ̿𝑗 𝑉 ⃗𝑗 𝑆𝑗 = 𝑊. The R program (R, 2015) was used for calculations and interpretation of the algorithms. The algorithms used in the elaboration of figures, maps and decomposition are available in the maps (Becker and Wilks, 2014), mapproj (McIlroy, 2014), maptools (Bivan and Lewin-Koh, 2016), lattice (Sarkar, 2008) and wmtsa (Constantine and Percival, 2013)..

(40) 40 3. Results. Figure 2 shows that the time series of Kiel, Rome, Climax and Huancayo have similar temporal behavior, differing in GCR intensity. We verified the reduction of GCR intensity from greater (Kiel) to lower latitude (Huancayo).. Figure 2. Time series of NM stations (a) Kiel, (b) Rome, (c) Climax and (d) Huancayo for the period from January 1st, 1960 to December 31st, 2004.. It is observed in the multiresolution decomposition of the time series of Kiel (Figure 3a), Rome (Figure 3b), Climax (Figure 3c), Huancayo (Fig.3.d) that the main oscillations were mainly concentrated in d6 crystals (5.3 years) and d7 (10.6 years). This study diverges with the study of El-Borie et al. (2011) in which they show that the observed differences are related to the 22-year cycle..

(41) 41. Figure 3. Multiresolution decomposition MODWT for times series of (a) Kiel, (b) Rome, (c) Climax and (d) Huancayo for the period from January 1st, 1960 to December 31st, 2004.. It is observed that the energies were concentrated in the periodicities of 5.3 years (d6) and 10.6 years (d7) for the Kiel (Figure 4a), Roma (Figure 4b), Climax (Figure 4c) and Huancayo (Figure 4d). It was also verified the percentage increase in the periodicity of 5.3 (d6) years from highest to lowest latitude for the stations of Kiel (32.7%), Rome (36.8%), Climax (37.7%) and Huancayo 41.7%), respectively. This decreasing relation was not very evident in Schwabe's Cycles (10.6 years) that oscillated from 28.8% to 36.3% for Huancayo and Climax respectively. In the Hale Cycles (22 years) there was a significant decrease between Kiel (12.2%) and Huancayo (2.8%), which are located in high and low latitudes, respectively..

(42) 42 It is known that these modulations in the GCR fluxes are related solar and terrestrial magnetic field and cutoff rigidity (Lockwood and Webber, 1967). Lemaitre and Vallarta (1933) showed that the GCR intensity varied with latitude due to interaction with the Earth's magnetic field. Thus, the distinction of the terrestrial magnetic field between the characteristic regions gave rise to this spatial distinction of periodicities.. Figure 4. Percentage energy distribution of the MODWT, crystal multiresolution MRA for the times series of Kiel (a), Rome (b), Climax (c) and Huancayo (d), for the period from January 1st, 1960 to December 31st, 2004.. It can be observed that the percentage distribution of energies, per periodicity, for both SSN (Figure 5a) and for radioflow (Figure 5b) are in agreement with the percentage distribution of the GCR flux of the stations of Kiel, Rome, Climax and Huancayo. In percentage, the cycle of 5.3 years corresponded to 41.2% and 39.7% for SSN and F10.7, respectively. The Schwabe Cycle (10.6 years) presented 38.4% and 40.6%, respectively..

(43) 43 There was an intense search in the literature about these percentages in the respective periodicities for both solar activity and GCR, without success. It is worth mentioning that this percentage distribution was validated for the period studied and can present different results with a longer or shorter series.. Figure 5. Percentage energy distribution of the MODWT, multiresolution by crystal for the time series of (a) SSN and (b) solar radio flux 10.7 cm for the period from January 1st, 1960 to December 31st, 2004.. 4. Conclusions Several studies show that the GCR intensity is attenuated with solar activity, terrestrial magnetic field among other factors. We verify if there is a percentage change of periodicities with the change in the spatial position of NM stations. The results showed that the NM stations' geographical position interferes with the intensity and frequency of GCR. The spatial distribution contributed to the energy percentage increase of 5.3 years from greater to lesser latitude, with the reverse process occurring for periodicities of 10.6 years and 22.3 years.. Acknowledgments To the Brazilian Coordination for the Improvement of Higher Education Personnel (CAPES) and to the Research Support Foundation of Rio Grande do Norte State (FAPERN) for granting doctoral fellowship to the author. Paulo S. Lucio is sponsored by a PQ2 grant (Proc. 307988/2013-9) from CNPq (Brazil)..

(44) 44 Reference Becker, R.A.; Wilks, A.R. maps: Draw Geographical Maps. R package version 2.3-9, http://CRAN.R-project.org/package=maps, 2014.. Belov, A. V., Gushchina, R. T., Obridko, V. N., Shelting, B. D., Yanke, V. G. Connection of the long-term modulation of cosmic rays with the parameters of the global magnetic field of the Sun. Geomagnetism and Aeronomy c/c of Geomagnetizm i Aeronomiia, 42(6), 693-700, 2002.. Bivand, R., Lewin-Koh, N. maptools: Tools for Reading and Handling Spatial Objects. R package version 0.8-39,http://CRAN.R-project.org/package=maptools, 2016. Calisto, M., Usoskin, I., Rozanov, E., Peter, T. Influence of Galactic Cosmic Rays on atmospheric composition and dynamics . Atmos. Chem. Phys., 11, 4547–4556, doi:10.5194/acp-11-4547-2011, 2011.. Chowdhury, P., Khan, M., Ray, P. C. Evaluation of the short and intermediate term periodicities in cosmic ray intensity during solar cycle 23. Planetary and Space Science, 58(7), 1045-1049, doi: http://dx.doi.org/10.1016/j.pss.2010.04.005, 2010.. Constantine,W., Percival, D. wmtsa: Wavelet Methods for Time Series Analysis. R package version 2.0-0. http://CRAN.R-project.org/package=wmtsa , 2013.. El-Borie, M. A., Aly, N. A., El-Taher, A. Mid-term periodicities of cosmic ray intensities. Journal. of. Advanced. Research,. 2(2),. 137-147,. doi:. http://dx.doi.org/10.1016/j.jare.2010.10.002 , 2011.. Fernandes, R.C., Lucio, P.S., Fernandez, J.H. Data Imputation Analysis for Cosmic Rays Time Series, doi: http://dx.doi.org/10.1016/j.asr.2017.02.022, 2017.. Kirkby, J. et al. Role of sulphuric acid, ammonia and galactic cosmic rays in atmospheric aerosol nucleation. Nature, 476 (7361), 429-433, doi:10.1038/nature10343, 2011..

(45) 45 Lockwood, J. A., Webber, W. R. The 11‐year solar modulation of cosmic rays as deduced from neutron monitor variations and direct measurements at low energies. Journal of Geophysical Research, 72(23), 5977-5989, 1967. Lucio, P. S. Learning with solar activity influence on Portugal’s rainfall: A stochastic overview, Geophys. Res. Lett., 32, L23819, doi:10.1029/2005GL023787, 2005. McIlroy, D. mapproj: Map. Projections. R package version 1.2-2,. http://CRAN.R-. project.org/package=mapproj , 2014. R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN: 3-900051-07-0, 2015.. Sarkar, D. Lattice: Multivariate Data Visualization with R. Springer-Verlag, New York, NY, ISBN: 978-0-387-75968-5, 2008.. Solanki, S. K., Schüssler, M.. Fligge, M. Secular variation of the Sun's magnetic flux. Astronomy & Astrophysics, 383(2), 706-712, 2012.. Svensmark, H., Friis-Cristensen, E. Variation of Cosmic Ray Flux and Global Cloud Coverage: A Missing Link in Solar-Climate Relationship, J. Atmos. Sol.-Terr. Phys. 59, 1225–1234, doi: http://dx.doi.org/10.1016/S1364-6826(97)00001-1 , 1997.. Valdés-Galicia, J. F. Low energy galactic cosmic rays in the heliosphere. Advances in Space Research, 35(5), 755-767, doi: http://dx.doi.org/10.1016/j.asr.2005.03.149 ,2005 .. Veretenenko, S., Ogurtsov, M. Regional and temporal variability of solar activity and galactic cosmic ray effects on the lower atmosphere circulation. Advances in Space Research, 49(4), 770-783, doi: http://dx.doi.org/10.1016/j.asr.2011.11.020, 2012.. Zarrouk, N., Bennaceur, R. Link nature between low cloud amounts and cosmic rays through wavelet. analysis.. Acta. Astronautica,. http://dx.doi.org/10.1016/j.actaastro.2009.11.001, 2010.. 66(9),. 1311-1319,.

(46) 46 CAPÍTULO III. Este capítulo refere-se ao artigo intitulado como “Study of the Galactic Cosmic Rays and Sunspots influence over the brazilan of Northern and Northeastern precipitation”, sob correção Journal of Atmospheric and Solar-Terrestrial Physics (JASTP). Esta revista está classificada pelos indicadores da CAPES Qualis B1 na área GEOCIÊNCIAS, com fator de impacto de 1,326.. STUDY OF THE GALACTIC COSMIC RAYS AND SUNSPOTS INFLUENCE OVER THE BRAZILAN OF NORTHERN AND NORTHEASTERN PRECIPITATION Ronabson C. Fernandesa*, Paulo S. Luciob, José H. Fernandezb a. Programa de Pós-Graduação em Ciências Climáticas, Universidade Federal do Rio Grande. do Norte, Natal/RN, 59078970, Brasil, ronabson@hotmail.com b. Departamento de Ciências Atmosféricas e Climáticas, Universidade Federal do Rio Grande. do Norte, Natal/RN, 59078970, Brasil, pslucio@ccet.ufrn.br ; jhenrix@gmail.com Corresponding author. Tel.:+55-84-3342-2479. Highlights . GCR increasing rainfall in North and Northeast of Brazil.. . Sunspot number decreasing rainfall in North and Northeast of Brazil.. . SSN and GCR influence precipitation on the monthly, seasonal, interannual and interdecadal scale. Abstract. The rainfall is of extreme importance to society. The extreme events causing droughts and floods bring great losses to population. Thus, check the influence of number of sunspots (SSN) and galactic cosmic rays (GCR) flux are of extreme importance to different fields of society. It was used 11 pluviometric stations, observed data, including Northern region (NB) and Northeastern (NEB) Brazil from 1961 to 1964. Those time series were periodically compated with the assistance of wavelet coherence. The results showed that SSN and of GCR.

(47) 47 flux worked negatively and positively, respectively in rain precipitation of NB and NEB, especially in the periodicity of 10.6 years. It was also confirmed the influence of SSN and GCR on monthly, seasonal, interannual and interdecadal of rainfall in those regions. We are not claiming that those variables (SSN and GCR) form rainfall in the region NB and NEB, but they are modules of pluviometric indexes due to availability of cosmogenic CCN. Consequently, there is the necessity of more studies to assess the different impacts of modulation of precipitation by SSN and CGR flux, making it relevant to predict extreme events of drought and floods. Keywords: Periodicity, wavelet coherence, precipitation. 1. Introduction Precipitation is of extreme importance to agriculture, livestock, economy, power sector and reservoirs, among others. Mitigation, duration and intensity of extreme events, such as droughts and floods bring great losses to society. Northern (NB) and Northeastern (NEB) region has a population of 15,864,454 inhabitants and 53,081,950 inhabitants, respectively, including a universe of 68,946,404 inhabitants. Those two regions are in responsible for 36.1% of Brazilian population. It is known that the rainy period of NB and NEB region is influenced by different producer systems of rainfall, such as Intertropical Convergence Zone (ITCZ), Easterly Wave Disturbances (EWD), instability lines, Upper Level Cyclonic Vortex (ULCV), Mesoscale Convective Complex (MCC), marine and earth winds, among others (Chaves and Cavalcanti, 2001; Molion and Bernardo, 2002; Cavalcanti et al., 2015, ). Furthermore, the Sea Surface Temperature (SST), such as El Niño and La Niña, South Atlantic dipole (SAD) and Pacific Decadal Oscillation (PDO) in a different way in rainfall period of NB and NEB (Clauzet and Wainer,1999; Geber et al., 2009; Bombardi et al.,2011; Nnamchi et al., 2011). Other factors influence rainy precipitation, such as the number of sunspots (SSN) and galactic cosmic rays (GCR). It is known that GCR are strongly attenuated by solar activity (AS). There is a dependency of intensity of GCR with magnetic parameters of Sun (Solanki et al., 2000; Belov et. al, 2002; McCracken ,2004). The GCR, upon interacting with some gases on earth atmosphere form condensation nucleus (CCN) (Calisto et al., 2011; Kirkby et al., 2011). The CCN can be biogenic, anthropogenic and cosmogenic. The CCN are hygroscopic particles, important to form clouds and rain precipitation. The relationship between earth cloudiness and GCR was found by Svensmark and Friis-Cristensen (1997)..

(48) 48 Studies have shown that the rainy and dry period has a different response from precipitation to variations of GCR (Mavrakis and Lykoudis, 2006). Kniveton and Todd (2001) found evidences of a strong statistically relationship between the GCR flux, rainfall and efficiency of precipitation over oceanic surfaces on medium and high latitudes around 4-7% in solar cycle in the 1980s. Aslam (2015) suggests that the increased of GCR flux positively reflects in rain precipitation and in low clouds and inversely in the air temperature. Thus, the periodic variability of SSN and GCR flux influences the rainfall in the region NB and NEB? Therefore, the objective of this research was investigating the relationship between rain precipitations of NB and NEB region with SSN and GCR flux.. 2. Method and Description 2.1. Dataset It was used the time series of GCR of Huancayo, located in Peru, having been made available by Russian Academy of Sciences, Geophysical Center World Data Center (WDC) for Solar-Terrestrial Physics, monthly, from 1961 to 2004. The time series of Huancayo/PER (Lat = -12.03º, Long = -75.33º, Alt = 3,400m, Cut-Off Rigidity = 13.45GV) was extended up to 2004, grounded on imputation of Fernandes et al. (2017). The history of SSN was obtained at the website Solar Influences Data Center (SIDC), Sunspot Index and Long-term Solar Observations. (SILSO). of. Royal. Observatory. of. Belgium,. Brussels. (http://www.sidc.be/silso/home ). The historical serial of rainfall of 11 pluviometry stations, 9 of them located at NEB and 2 at NB, monthly total, belong to Instituto Nacional de Meteorologia (INMET). The study period is within January 01st 1961 to December 31st 2004. There were missing data in those pluviometry serials, being required the imputation of data. The spatial distribution (Fig.1) and characteristics (Table 1) of used stations.. Table 1 - Location of 11 rain gaunges according to its longitude, latitude, and altitude for the period from January 1st, 1961 to December 31st, 2004..

(49) 49. Station. City. 83096. Longitude Latitude. Altitude(m). (deg). (deg). Aracaju – SE. -37.04. -10.95. 4.72. 82191. Belém– PA. -48.43. -1.43. 10.00. 82397. Fortaleza – CE. -38.54. -3.75. 26.45. 82798. João Pessoa – PB. -34.86. -7.1. 7.43. 82994. Maceió – AL. -35.7. -9.66. 64.50. 82331. Manaus. -60.01. -3.1. 61.25. 82598. Natal – RN. -35.2. -5.91. 48.60. 82900. Recife – PE. -34.95. -8.05. 10.00. 83229. Salvador – BA. -38.5. -13. 51.41. 82280. São Luis –MA. -44.21. -2.53. 50.86. 82578. Teresina – PI. -42.81. -5.08. 74.36. Fig. 1. Location of 11 pluviometric stations and of GCR station located at Huancayo (Peru).. 2.2. Wavelet Attempting to inter-relate coherence between observed series of SSN, GCR flux with precipitation, it was applied to each temporal serial the Continuos Wavelet of Morlet (WM) or known as continuous wavelet transformed (WCT).. The algorithm of WCT is available at.

Referências

Documentos relacionados

For the analysis (validation) of the average rainfall series by the alternative methodology, without gap filling, these series were compared to the ones obtained from the

Seguindo o mesmo critério adotado para a definição do peso de cada variável utilizada no cálculo dos três su- bíndices, optou-se por utilizar o IDES com a ponderação que

The specific aims of this study are: (1) to generate a runoff time series using field data and hydrological modelling; (2) to generate a suspended sediment load time series using a

Using time–series data drawn from the United Nations national accounts databases, this study applies the econometric Granger causality methodology to investigate the

The mean concentration values for each compound, calculated as the arithmetic mean of data obtained from January to December 2012, and the concentrations from the

The study period was from January 1, 2012, to December 31, 2012, and data were retrieved on hospitalizations for the months of November and December 2012, investigating information

Moisture data for Station 84 (red clay): (a) Time series for each sensor; (b) Horizontal moisture isochrones A comparison between the precipitation and the gravimetric water content

Patients were consecutively enrolled and divided into two groups, one including the cases enlisted from January 2004 to December 2004, representing the pre-MELD group, in which