• Nenhum resultado encontrado

Estudo in silico das bases genéticas das falhas reprodutivas

N/A
N/A
Protected

Academic year: 2021

Share "Estudo in silico das bases genéticas das falhas reprodutivas"

Copied!
100
0
0

Texto

(1)

UNIVERSIDADE FEDERAL DO RIO GRANDE DO SUL DEPARTAMENTO DE GENÉTICA

PROGRAMA DE PÓS-GRADUAÇÃO EM GENÉTICA E BIOLOGIA MOLECULAR

ESTUDO IN SILICO DAS BASES GENÉTICAS DAS FALHAS REPRODUTIVAS

Flavia Gobetti Gomes

Orientadora: Maria Teresa Vieira Sanseverino Co-orientador: Lucas Rosa Fraga

(2)

2

UNIVERSIDADE FEDERAL DO RIO GRANDE DO SUL DEPARTAMENTO DE GENÉTICA

PROGRAMA DE PÓS-GRADUAÇÃO EM GENÉTICA E BIOLOGIA MOLECULAR

ESTUDO IN SILICO DAS BASES GENÉTICAS DAS FALHAS REPRODUTIVAS

Flavia Gobetti Gomes

Orientadora: Maria Teresa Vieira Sanseverino Co-orientador: Lucas Rosa Fraga

Dissertação submetida ao Programa de Pós-Graduação em Genética e Biologia Molecular da UFRGS como requisito parcial para a obtenção do grau de Mestre em Genética e Biologia Molecular.

(3)

3

INSTITUIÇÕES FINANCIADORAS

____________________________________________________________

Este trabalho foi desenvolvido no Laboratório de Genética Médica e Evolução do Departamento de Genética do Instituto de Biociências da Universidade Federal do Rio Grande do Sul e no Hospital de Clínicas de Porto Alegre (HCPA).

(4)

4

AGRADECIMENTOS

_________________________________________________________________ Primeiramente, gostaria de agradecer aos meus pais por sempre me incentivarem e não medirem esforços para que eu pudesse realizar todos os meus objetivos e que, por mais distantes que estejam, acompanharam cada passo dessa nova conquista.

Ao meu marido Magnus, por toda a paciência e compreensão durante esse período de dedicação ao projeto, por me incentivar toda vez que eu pensava estar sem rumo e por ouvir inúmeras vezes meu trabalho durante esses anos, mesmo trabalhando em uma área totalmente diferente.

Ao bebê Lucca que ainda está a caminho, mas que já tem um papel extremamente importante em minha vida, mudando toda a minha forma de ver as coisas e dando um valor ainda maior a esse tema de estudo.

Quero agradecer a minha orientadora Tere e ao meu co-orientador Lucas por terem me acolhido de uma forma surpreendente, aceitando minhas limitações e me incentivando a todo momento, mesmo quando eu não acreditava que daria conta. Foram realmente os melhores orientadores que eu poderia ter.

A Thayne que disponibilizou inúmeras horas de sua agenda (bastante disputada) para me ajudar e com toda sua paciência e doçura me ensinou tudo o que eu precisei para obter esses resultados. Sem você esse projeto não seria possível! Muito obrigada.

Quero agradecer a Drª Angela e ao Dr Gustavo que apoiaram a realização desse sonho desde o começo, incentivando, servindo de exemplo e flexibilizando toda nossa agenda de trabalho para que isso fosse possível. Agradeço também toda a equipe Embrios pela ajuda na logística de trabalho e agenda.

Quero agradecer a meus amigos e familiares por entenderem a minha ausência nesses últimos 2 anos e por me apoiarem até a conclusão desse sonho.

Ao meu grupo de pesquisa João, Marcus e Emília por toda ajuda científica e mental no projeto, nas disciplinas e na vida.

A todo o pessoal do Lab113 e a professora Lavínia, por disponibilizar o espaço e um ambiente muito acolhedor.

(5)

5

Agradeço também a todos os professores do PPGBM e funcionários do Serviço de Genética Médica do Hospital de Clínicas de Porto Alegre pelas conversas e ensinamentos e ao Elmo do PPGBM pela paciência e disponibilidade para me ajudar em todos os meus questionamentos e peculiaridades de agenda.

Por fim, agradeço a Deus por ter me dado essa oportunidade única, ter me dado força nos momentos difíceis e por me proteger na estrada a cada ida e vinda de aulas e reuniões.

(6)

6 Sumário RESUMO... 7 ABSTRACT ... 9 CAPÍTULO 1: Introdução ... 11 Falhas Reprodutivas ... 11 Abortamentos ... 11 Falhas de Implantação ... 13

Ensaios Experimentais nas Falhas Reprodutivas ... 15

CAPÍTULO 2: Objetivo e Justificativa ... 19

CAPÍTULO 3: In Silico Study of the Genetic Bases of Reproductive ... 22

CAPÍTULO 4: Discussão ... 76

CAPÍTULO 5: Conclusão e perspectivas ... 79

(7)

7

RESUMO

_________________________________________________________________ A reprodução humana é consideravelmente ineficaz, uma vez que cerca de 70% das suas concepções não sobrevivem até o nascimento. Estima-se que aproximadamente 50% dessas gestações não evoluem adequadamente, sendo perdidas antes mesmo do reconhecimento clínico ou da presença de atividade cardíaca embrionária. Apesar dos grandes avanços em diagnóstico e tratamento da infertilidade, a sua prevalência cresce a cada ano. De acordo com o relatório da Organização Mundial da Saúde (OMS), um a cada seis casais enfrenta algum tipo de dificuldade ao tentar engravidar, correspondendo a um total de 80 milhões de pessoas em todo o mundo, grande parte decorrente das falhas reprodutivas relacionadas a abortamentos e falhas de implantação. Entre as possíveis causas para os diversos tipos de falhas reprodutivas, acredita-se que, dentre as causas conhecidas, 50% estejam relacionados a alterações genéticas, variando entre aberrações cromossômicas, mutações de genes únicos ou múltiplos e polimorfismos.

A genética da fertilidade é um assunto vasto e, apesar de diversos estudos, não existem genes bem definidos como marcadores de susceptibilidade para essas falhas reprodutivas. Portanto, há necessidade de novas pesquisas para identificar potenciais genes relacionados a essa condição. Nesse estudo, buscou-se, através de análise in sílico, a determinação dos principais genes envolvidos nas duas falhas reprodutivas: abortamento de repetição e falha de implantação.

Através de busca nos bancos de dados Online Mendelian Inheritance in Man (OMIM), Human Genome Epidemiology encyclopedia (HuGE) e Comparative

Toxicogenomics Database (CTD) e do uso da biologia de sistemas, foi possível,

além de analisar os principais mecanismos envolvidos nas duas falhas reprodutivas estudadas e determinar potenciais genes biomarcadores para essas condições, sendo eles: PCNA, NOP58, EGFR, ACTA2, CCT5, ANAPC10, NMD3, DNAJC8,

ELP3, UTP15, HMGCS1, TGFA, CDC25A, DIAPH3, CCNA2, NEK2 e FZR1.

Esses genes se mostraram essenciais para a formação das principais redes de interação proteína-proteína relacionadas aos mecanismos envolvidos nas falhas de reprodução estudadas. Após pesquisar na literatura, dentre os possíveis

(8)

8

candidatos, os genes PCNA, EGFR, TGFA e FZR1 foram relacionados a falhas reprodutivas, enquanto genes como ANAPC10, DNAJC8, CDC25A, CCNA2 e

NEK2 têm grande importância em mecanismos envolvidos no sucesso de uma

gravidez, como a regulação do ciclo celular, atuando diretamente no equilíbrio entre proliferação e apoptose celular.

Apesar de todos genes encontrados estarem ligados à processos biológicos envolvidos na implantação embrionária e manutenção da gravidez e alguns inclusive já terem sido intimamente ligados às falhas reprodutivas, não há ainda estudos que relacionem polimorfismos desses genes à pacientes com histórico de abortamento de repetição e/ou falha de implantação. Acredita-se que os resultados obtidos podem nos fornecer novos caminhos para o estudo mais aprofundado dessas duas condições reprodutivas.

(9)

9

ABSTRACT

_________________________________________________________________ Human reproduction is considerably ineffective, since about 70% of its conceptions do not survive until birth. It is estimated that approximately 50% of these pregnancies do not progress properly, being lost even before clinical recognition or the presence of embryonic cardiac activity. Despite great advances in the diagnosis and treatment of infertility, its prevalence grows every year. According to the report of the World Health Organization (WHO), one in six couples faces some type of difficulty when trying to conceive, corresponding to a total of 80 million people worldwide, a large part resulting from reproductive failures related to abortions and implantation failures. Among the possible causes for the different types of reproductive failures, it is believed that, among the known causes, 50% are related to genetic alterations, varying between chromosomal aberrations, single or multiple gene mutations and polymorphisms.

The genetics of fertility is a vast subject and, despite several studies, there are no well-defined genes as susceptibility markers for these reproductive failures. Therefore, further research is needed to identify potential genes related to this condition. In this study, we aimed, through in silico analysis, to determine the main genes involved in the two reproductive failures: recurrent pregnancy loss and implantation failure.

By searching the Online Mendelian Inheritance in Man (OMIM), Human Genome Epidemiology encyclopedia (HuGE) and Comparative Toxicogenomics Database (CTD) databases and using systems biology, it was possible, in addition to analyzing the main mechanisms involved in both reproductive failures studied, determine potential biomarker genes for these conditions, namely PCNA, NOP58, EGFR,

ACTA2, CCT5, ANAPC10, NMD3, DNAJC8, ELP3, UTP15, HMGCS1, TGFA, CDC25A, DIAPH3, CCNA2, NEK2 and FZR1. These genes proved to be essential

for the formation of the main protein-protein interaction networks related to the mechanisms involved in the reproduction failures. After searching the literature, among the of these possible candidates, the PCNA, EGFR, TGFA and FZR1 genes have been related with reproductive failures, whereas genes such as ANAPC10, DNAJC8, CDC25A, CCNA2 and NEK2 have importance in mechanisms involved in

(10)

10

the success of a pregnancy, such as the regulation of the cell cycle, acting directly in the balance between cell proliferation and apoptosis.

Although all genes found are linked to biological processes involved in embryonic implantation and pregnancy maintenance, and some have even been closely linked to reproductive failure, there are still no studies that relate polymorphisms of these genes to patients with a history of repeated abortion and / or deployment failure. It is believed that the results obtained may provide us with new paths for further study of these two reproductive conditions.

(11)

11

Capítulo 1 – INTRODUÇÃO _________________________________________________________________

(12)

12

FALHAS REPRODUTIVAS

1.1 Abortamentos

Abortamento espontâneo consiste na perda de uma gestação clinicamente estabelecida, antes que o feto tenha se desenvolvido suficientemente para garantir sua viabilidade, ou até 24 semanas de gestação (Van Den Berg et al, 2014; ESHRE, 2017). Ele pode ser subdividido em abortamentos precoces (antes de 12 semanas de gestação) ou tardios (de 12 a 24 semanas).

Entre 15 a 25% de todas as gestações reconhecidas terminam em abortamento, com a maior parte ocorrendo durante as primeiras semanas, sendo que menos de 5% das perdas ocorrem entre a 13ª e 19ª semana, e apenas 0,3% ocorrem após 20 semanas (Michels and Tiu, 2007; Wang et al, 2003).

A incidência de abortamento varia de acordo com a idade materna, de tal forma que as mulheres com menos de 35 anos têm entre 9 a 12% de risco de perda gestacional no primeiro trimestre, já para mulheres com 40 anos ou mais, esse risco aumenta para 50% (Shahine et al, 2015).

A ocorrência de dois ou mais abortamentos espontâneos, cenário conhecido como Abortamento de Repetição (AR) é uma condição um pouco mais incomum. Estima-se que aproximadamente 5% das mulheres terão duas perdas gestacionais e que apenas 1% terá três ou mais (Kaser et al, 2018).

O AR está associado a várias causas, tais como anomalias cromossômicas, trombofilias, fatores imunológicos, hormonais, metabólicos, infecciosos, anatômicos, entre outros (Kaser et al, 2018). Dentre as causas genéticas das perdas gestacionais, aproximadamente 50% dos casos estão associados a aneuploidias, sendo mais frequentes as trissomias (51,9%), poliploidias (18,8%), monossomias (15,2%), defeitos cromossômicos estruturais (6,5%) e outros (7,6%) (Zhang et al, 2018). Entretanto, aproximadamente metade dos casos de AR permanece com sua etiologia desconhecida (Ford and Schust, 2009).

Diversos estudos buscam polimorfismos em diferentes genes que possam estar associados à susceptibilidade aos AR idiopáticos, por exemplo, genes envolvidos na resposta imune (IFNG, IL10, KIR2DS2, KIR2DS3, KIR2DS4, MBL,

(13)

13 TNF), coagulação (F2, F5, PAI-1, PROZ), metabolismo (GSTT1, MTHFR),

angiogênese (NOS3, PTGS2, VEGFA), genes supressores de tumor (TP53), entre outros (Kaser et al, 2018). Contudo, grande parte desses estudos buscam fatores de susceptibilidade a partir de genes candidatos, avaliando assim, causas pontuais e isoladas para essa falha reprodutiva. Por envolver mecanismos complexos e por ter grande parte dos casos sem etiologia conhecida, torna-se necessária uma investigação genética mais completa dos possíveis fatores relacionados ao AR para que sejam identificados novos genes alvos que possam estar mais fortemente envolvidos a essa falha reprodutiva.

1.2 Falha de Implantação

A implantação embrionária ocorre através da interação de um embrião em fase de blastocisto e um útero receptivo, em um período de tempo limitado, conhecido como janela de implantação. Esse processo requer a sincronização entre a capacidade de implantação do blastocisto e um estado receptivo no endométrio uterino (Zhang et al, 2013; Dey et al., 2004). Evidências moleculares e genéticas indicam que esses dois eventos são regulados com precisão pelos hormônios maternos, em especial estrógeno e progesterona, juntamente com moléculas de sinalização produzidas localmente, incluindo citocinas, fatores de crescimento, fatores de transcrição, mediadores lipídicos e genes morfogênicos, através de interações autócrinas, parácrinas e justácrinas (Zhang et al, 2013).

A implantação embrionária depende da proliferação do trofoblasto, sua invasão no endométrio e angiogênese local, processos regulados por proteínas que controlam o equilíbrio entre fatores de crescimento e apoptose (Mojarrad et al, 2013). A implantação é uma etapa crítica para o estabelecimento da gravidez e requer eventos moleculares e celulares que resultam em crescimento e diferenciação uterina, adesão de blastocistos, invasão e formação de placenta. A implantação bem-sucedida requer um endométrio receptivo, um embrião normal e funcional no estágio do blastocisto e um diálogo sincronizado entre os tecidos materno e embrionário. Além do papel bem caracterizado dos hormônios sexuais, a complexidade da implantação e da placentação de embriões é determinada pelo

(14)

14

número de citocinas e fatores de crescimento com papéis demonstrados nesses processos. Variações no nível de expressão e ação normais dessas citocinas resultam em falha absoluta ou parcial do implante e formação placentária anormal em camundongos e humanos (Guzeloglu-Kayisli et al, 2009).

De acordo com Coughlan et al. (2014), é considerada falha de implantação (FI) quando, após a transferência de pelo menos quatro embriões de boa qualidade em um mínimo de três ciclos de Fertilização in vitro, em mulheres com menos de 40 anos, não se alcança a gestação (Couglan et al, 2014).O avanço das técnicas de reprodução assistida tornou possível uma análise mais ampla das implantações embrionárias e suas possíveis falhas, e atualmente estima-se que as FI são responsáveis por pelo menos 40% dos insucessos em ciclos de reprodução assistida (Quintero et al. 2017).

Estudos sugerem que fatores genéticos envolvidos no processo de implantação são determinantes no estabelecimento da gestação, e assim, alterações nos genes relacionados podem levar, ou pelo menos aumentar a suscetibilidade para falha de implantação (Bashiri, et al, 2018; Goodman et al, 2008; Krussel et al, 2003; Maruyama et al, 2008). Vários genes envolvidos na invasão trofoblástica e na angiogênese já foram identificados, dentre eles, o TP53, um supressor tumoral, potente indutor de apoptose e angiogênese, tem sua baixa expressão associada à falha de implantação (Feng et al, 2016). Além disso, estudos recentes demonstraram que o TP53 está associado à essa falha reprodutiva também através da regulação do fator inibidor da leucemia (LIF), importante no processo de implantação embrionária. Outros genes como o Mdm2 e Mdm4 também estão relacionados por regularem a expressão do TP53 (Mojarrad et al, 2013).

Além dos membros da via de sinalização da proteína p53, vários outros genes estão relacionados ao processo de implantação. A maioria deles está envolvida na invasão dos tecidos embrionários no endométrio e na homeostase hormonal da gravidez (Mojarrad et al, 2013).

Membros da família de citocinas gp130, interleucina-11 (IL-11) e LIF, superfamília beta do fator transformador de crescimento, fatores estimuladores de colônias e sistemas IL-1 e IL-15 são moléculas cruciais para o sucesso da

(15)

15

implantação (Guzeloglu-Kayisli et al, 2009). As quimiocinas também são importantes, tanto no recrutamento de coortes específicas de leucócitos no local do implante quanto no tráfico e diferenciação de trofoblastos (Guzeloglu-Kayisli et al, 2009).

Outro fator importante na fisiologia reprodutiva é o óxido nítrico (NO), que participa da invasão trofoblástica, na regulação da circulação feto-placentária e apoptose de células trofoblásticas (Webster et al. 2008). Além desses, genes como

MUC-1, MUC-4, hPR, também já foram associados no processo de implantação

(Mojarrad et al, 2013).

Considerando a importância desses fatores nas diferentes etapas da implantação embrionária, acredita-se que possíveis polimorfismos podem estar relacionados a falhas no processo. Apesar dos inúmeros trabalhos sobre fatores envolvidos no processo de implantação embrionária, ainda não é bem definida a sua regulação e como as possíveis alterações genéticas podem estar relacionadas às FI. Mais uma vez torna-se relevante realizar estudos genéticos de pacientes com essa falha reprodutiva, a fim de identificar genes envolvidos em sua etiologia.

1.3. Ensaios Experimentais nas Falhas Reprodutivas

Devido às variadas etiologias, muitos estudos apresentam sucesso limitado em identificar fatores genéticos de susceptibilidade diretamente relacionados a AR e FI (Quintero et al. 2017). Além disso, a complexidade molecular de cada etapa do processo reprodutivo em mamíferos torna desafiadora a identificação das regiões genômicas responsáveis por traços complexos, o que dificulta o delineamento de uma estratégia de seleção de genes candidatos às falhas reprodutivas (Quintero et

al. 2017).

Diversos estudos buscam analisar a expressão diferencial (Diferential gene

expression - DGE) entre tecidos maternos e fetais de casos de falhas reprodutivas,

e os resultados obtidos nesses estudos nos mostram diferentes genes com DGE e inúmeras vias moleculares acometidas nessas falhas reprodutivas, demonstrando a complexidade de processos envolvidos nessas condições.

(16)

16

Altmae et al (2010) investigaram o perfil de expressão endometrial em mulheres com infertilidade sem causa aparente, em comparação com mulheres férteis, no momento da implantação embrionária, a fim de identificar possíveis biomarcadores de receptividade uterina e mecanismos relacionados à infertilidade. Como resultado, foram identificados 145 genes significativamente com expressão aumentada e 115 genes com expressão reduzida nas mulheres inférteis em comparação ao controle (Altmae et al, 2010). Estes incluíram mucinas (MUC4,

MUC5B), metaloproteinases (MMP8, MMP10, MMP26), citocinas e quimiocinas

(SCBG3A1, FAM3D, FAM3B, CCR7, CXCL6, IL21, CMTM5), integrinas, imunomoduladores, genes reguladores do metabolismo lipídico e outros. De especial interesse são as moléculas que participam da ligação de um embrião ao epitélio endometrial materno, como moléculas de adesão e metaloproteinases extracelulares degradadoras de matriz.

Lédée et al (2010) também observaram diferença de expressão gênica significativa no endométrio de mulheres com FI e AR, em comparação a mulheres sem histórico de falhas reprodutivas (Ledee et al, 2010). Esses genes diferencialmente expressos foram relacionados a diferentes processos biológicos, como ciclo celular, expressão gênica, função do sistema hematológico e do sistema nervoso, sinalização celular, resposta imune e montagem e organização celular. Salker et al (2018), analisando a expressão do endométrio em diferentes fases do ciclo menstrual, identificaram o LEFTY2, um membro da família do fator transformador de crescimento (TGF-β), como um regulador de receptividade (Salker et al, 2018). A proteína produzida a partir desse gene desempenha um papel importante na determinação de assimetria laterais na posição dos órgãos

durante o desenvolvimento embrionário de vertebrados (Meno et al, 1997) e também pode desempenhar um papel importante no sangramento endometrial (Kothapalli et al, 1997). Mutações neste gene têm sido associadas a malformações do eixo esquerdo-direito, particularmente no coração e nos pulmões e alguns tipos de infertilidade foram associados à expressão desregulada deste gene no endométrio (Tabibzadeh et al, 2000).

Em outro estudo, através da análise do perfil de expressão endometrial durante o suporte de uma gestação normal, identificou-se o gene NLPR2 como um

(17)

17

possível biomarcador de predição de implantação embrionária (Li et al, 2017). Esse gene pertence à família NLRP, conhecida como um importante fator regulador na resposta imunológica inata e na via inflamatória (Kufer et al, 2011) e em estudos com camundongos demonstrou-se um importante papel na reprodução, uma vez que a deficiência materna de NLPR2, pode causar letalidade embrionária (Peng et

al, 2017).

Além dessas análises de perfil endometrial, vários estudos procuram estipular um perfil genético que possa ser relacionado a taxa de fertilização, potencial de implantação embrionária e gestações sem intercorrência em pacientes submetidas a procedimentos de reprodução assistida, através de expressão em células da granulosa do complexo cúmulos-oócito (Papler et al, 2015; Wathlet et al, 2012).

Wathlet et al (2012) indicaram os genes EFNB2 e CAMK1D como possíveis marcadores preditivos de qualidade embrionária e eficiência de implantação (Wathlet et al, 2012). O gene EFNB2 já foi relacionado à migração, disseminação e adesão de células durante a formação dos vasos sanguíneos (Foo et al, 2006), enquanto o CAMK1D, expresso em leucócitos polimorfonucleares, faz parte da via de transdução de sinal de quimiocinas que regula a função granulocítica (Verploegen et al, 2000).

Além disso, a DGE de alguns genes é comum em várias complicações gestacionais, servindo como biomarcador para as mesmas. A expressão alterada de genes relacionados à resposta inflamatória e à apoptose celular já foram relacionados a abortamentos de repetição e mola hidatiforme (Kim et al, 2006), enquanto a transcrição alterada de genes regulatórios de vias metabólicas tem sido associada a problemas no crescimento fetal e complicações maternas como pré-eclâmpsia e diabetes gestacional (Enquobahrie et al, 2009; Struwe et al, 2010).

Os estudos de expressão gênica podem ser depositados no Gene

Expression Omnibus (GEO), que surgiu como uma iniciativa do National Center for Biotechnology Information (NCBI) como um repositório público de expressão gênica

em larga escala (Edgar et al., 2002). Desde 2013, o GEO comporta dados de microarranjos e de sequenciamento de nova geração introduzidos pela comunidade científica (Barrett et al., 2013). Tais dados brutos são disponíveis para acesso e uso

(18)

18

para avaliação e análises em pesquisas com dados secundários. Esses dados têm se mostrado úteis para o conhecimento dos mecanismos moleculares de diversos processos envolvidos na concepção e a sua possível relação nas falhas reprodutivas.

Mesmo com funções diferentes e, muitas vezes, envolvidos em mecanismos não relacionados, esses genes (dentre muitos outros) estão interligados a uma rede de processos essenciais para uma gestação saudável. Muitas vezes a busca direta de genes envolvidos em uma condição específica pode nos fornecer um resultado isolado e até mesmo incompleto.

As falhas reprodutivas, mesmo com etiologia diferente, podem possuir processos reprodutivos afetados em comum, e a busca de possíveis marcadores envolvidos de forma concomitante em mais de uma falha reprodutiva pode nos indicar biomarcadores úteis para a prática clínica.

(19)

19

Capítulo 2 – JUSTIFICATIVAS E OBJETIVOS ____________________________________________________________

(20)

20

2.1. JUSTIFICATIVA

Apesar da grande busca por fatores de susceptibilidade para falhas reprodutivas, a maioria dos estudos utiliza como abordagem a análise de genes candidatos relacionados a processos específicos da reprodução, avaliando, assim, condições isoladas das falhas reprodutivas. Embora a reprodução humana seja um processo complexo, acredita-se que fatores que levam o seu sucesso ou suas diferentes condições de insucesso, podem muitas vezes se sobrepor. Diversos genes podem ser responsáveis por diferentes eventos na gestação e, assim, podem estar envolvidos em causas variadas nas falhas reprodutivas.

Poucos estudos avaliam a participação de genes em diferentes eventos adversos da gestação, sendo necessária uma abordagem mais ampla de busca por genes envolvidos nas falhas de implantação e nos abortamentos de repetição, concomitantemente.

Os estudos in silico, os dados experimentais e a literatura apresentam uma ampla gama de informações que podem nos fornecer indicações de fatores genéticos possivelmente envolvidos nas falhas reprodutivas. Dessa forma, a utilização de diferentes plataformas poderia nos levar ao conhecimento das bases genéticas das falhas reprodutivas de uma forma isolada e comum.

A partir desses resultados, seria possível definir potenciais biomarcadores envolvidos nos AR e FI. Isso representaria uma importante ferramenta para a prática clínica, uma vez que facilitaria o diagnóstico da causa da falha reprodutiva e o prognóstico do casal, reduzindo, assim, custos e sofrimentos envolvidos na busca pela concepção. A partir disso, com uma medicina personalizada, o manejo de cada paciente seria mais preciso e eficiente e seria possível até mesmo melhorar as taxas de sucesso nos ciclos de Fertilização in vitro.

(21)

21

2.2. OBJETIVOS

2.2.1. OBJETIVO GERAL:

Avaliar, através de revisão bibliográfica e utilizando diferentes abordagens de bioinformática, fatores genéticos envolvidos nas falhas reprodutivas.

2.2.2. OBJETIVOS ESPECÍFICOS

1 – Realizar o levantamento de genes já associados a falhas de implantação, abortamentos e abortamentos de repetição através de levantamento nos bancos de dados Online Mendelian Inheritance in Man (OMIM), Human Genome

Epidemiology encyclopedia (HuGE) e Comparative Toxicogenomics Database

(CTD);

2- Avaliar e comparar a expressão gênica em tecido endometrial no período pré-implantacional, através dos dados depositados no banco de dados Gene

Expression Omnibus (GEO);

3 – Identificar os genes em comum encontrados nos bancos de dados, descritas no Objetivo 1, com os genes dos dados de expressão, descritos no Objetivo 2;

4 – Analisar a interação dos genes encontrados nas diferentes buscas e suas ontologias.

5 – Determinar possíveis genes candidatos obtidos a partir das buscas e comparações, identificando na literatura as associações já conhecidas desses genes com as falhas reprodutivas pesquisadas.

(22)

22

Capítulo 3 - In Silico Study of Genetic Bases of Reproductive Failure ____________________________________________________________

(23)

23 In Silico Study of Genetic Bases of Reproductive Failure

F.G. Gomes1; J.M. Bremm1 ; M. Michels1 ; T.W. Kowalski1,3,4; L.R. Fraga2,3 ; M.T.V. Sanseverino1,5,6

1Post-Graduation Program in Genetics and Molecular Biology, Department of Genetics, Biosciences Institute, Universidade Federal do Rio Grande do Sul, Porto Alegre, 91501-970, Brazil.

2Department of Morphological Science, Institute of Basic Health Sciences, Universidade Federal do Rio Grande do Sul, Porto Alegre, 90050- 170, Brazil. 3 Genomic Medicine Laboratory, Experimental Research Centre, Hospital de Clínicas de Porto Alegre

4Complexo de Ensino Superior de Cachoeirinha, CESUCA, Cachoeirinha, RS, Brazil

5School of Medicine, Pontifícia Universidade Católica do Rio Grande do Sul, Porto Alegre, 90619-900, Brazil.

6Medical Genetics Service, Hospital de Clínicas de Porto Alegre, Porto Alegre, 90035- 903, Brazil.

*Corresponding authors:

Maria Teresa Vieira Sanseverino, MD, PhD

Serviço de Genética Médica, Hospital de Clínicas de Porto Alegre. CEP 90050- 170/ Porto Alegre – RS – Brasil

Phone: E-mail: msanseverino@hcpa.edu.br AND

(24)

24

Departamento de Ciências Morfológicas, Universidade Federal do Rio Grande do Sul. CEP 90035-903/ Porto Alegre – RS – Brasil

Phone: +55 51 33084021 E-mail: lrfraga@ufrgs.br

ABSTRACT: Infertility, inability to conceive, whether due to difficulty in getting pregnant or maintaining pregnancy, is related to female factors in 45% of cases, male factors in 30% and unknown causes in 25%. Among the possible causes for the various types of reproductive failure, it is believed that, among the known causes, 50% are related to genetic changes, varying between chromosomal aberrations, single or multiple gene mutations and polymorphisms. The genetics of fertility is a vast subject and, despite several studies, there are no well-defined genes as susceptibility markers for these reproductive failures. Therefore, further research is needed to identify potential genes related to this condition.

Bioinformatics tools, experimental data and literature present a wide range of information that can provide us with indications of genetic factors possibly involved in reproductive failure.

Thus, in this study, through a database search in Online Mendelian Inheritance in Man (OMIM), Human Genome Epidemiology encyclopedia (HuGE) and Comparative Toxicogenomics Database (CTD) databases and the use of systems biology, it was possible, in addition to analyzing the main mechanisms involved in the two reproductive failures studied, determine potencial biomarkers for these reproductive conditions, namely PCNA, NOP58, EGFR, ACTA2, CCT5, ANAPC10,

NMD3, DNAJC8, ELP3, UTP15, HMGCS1, TGFA, CDC25A, DIAPH3, CCNA2, NEK2 and FZR1.

These genes proved to be essential for the formation of the main protein-protein interaction networks related to the mechanisms involved in the reproduction failures.

(25)

25

After searching the literature, among the of these possible candidates, the PCNA, EGFR, TGFA and FZR1 genes have been related with reproductive failures, whereas genes such as ANAPC10, DNAJC8, CDC25A, CCNA2 and NEK2 have importance in mechanisms involved in the success of a pregnancy, such as the regulation of the cell cycle, acting directly in the balance between cell proliferation and apoptosis.

Although all genes found are linked to biological processes involved in embryonic implantation and pregnancy maintenance, and some have even been closely linked to reproductive failure, there are still no studies that relate polymorphisms of these genes to patients with a history of repeated abortion and / or deployment failure. It is believed that the results obtained may provide us with new paths for further study of these two reproductive conditions.

Key words: Recurrent Pregnancy Loss, Implantation Failure, reproductive failure

1. INTRODUCTION:

Human reproduction is considerably ineffective, since about 70% of its conceptions do not survive until birth; and approximately 50% are lost before the clinical recognition or embryonic cardiac activity (Hyde and Schust, 2015). Despite great advances in the diagnosis and treatment of infertility, its prevalence grows every year. According to the report of the World Health Organization (WHO), one in six couples face some type of difficulty when trying to conceive, corresponding to a total of 80 million people worldwide (Tournaye and Cohlen, 2012), most of them related to abortions and implantation failures. Between 15 to 25% of all recognized pregnancies end in abortion, and it is currently estimated that implantation failures are responsible for at least 40% of failures in assisted reproduction cycles (Quintero et al. 2017)

Spontaneous abortion consists of the loss of a clinically established pregnancy, before the fetus has developed sufficiently to guarantee its viability, or up to 24 weeks of gestation (Van Den Berg et al, 2014). The occurrence of two or more spontaneous abortions, a scenario known as Recurrent Pregnancy Loss (RPL) is a slightly more unusual condition. It is estimated that approximately 5% of

(26)

26

women will have two pregnancies loss and that only 1% will have three or more (Kaser et al, 2018). RPL is associated with several causes, such as chromosomal abnormalities, thrombophilia, immunological, hormonal, metabolic, infectious, anatomical factors, among others (Kaser et al, 2018). However, approximately half of RPL cases have an unknown etiology (Ford and Schust, 2009; Khalife et al, 2018).

Embryonic implantation occurs through the interaction of a viable blastocyst and a receptive uterus, in a limited period of time, known as the implantation window. This process depends on the proliferation of the trophoblast, its invasion in the endometrium and local angiogenesis, processes regulated by proteins that control the balance between growth factors and apoptosis (Mojarrad et al, 2013). It is a critical stage for the establishment of pregnancy and requires molecular and cellular events that result in uterine growth and differentiation, adherence to blastocysts, invasion and formation of the placenta. According to Coughlan et al. (2014), it is considered implantation failure (IF) when, after transferring at least four good quality embryos in a minimum of three cycles of in vitro Fertilization, in women under 40, pregnancy is not reached ( Couglan et al, 2014).

The advancement of assisted reproduction techniques has made possible a broader analysis of embryonic implantations and their possible failures.

Studies suggest that genetic factors involved in the implantation process are determinant in the establishment of pregnancy, and thus, changes in the related genes can lead, or at least increase the susceptibility to implantation failure (Goodman et al, 2008; Krussel et al, 2003 ; Maruyama et al, 2008).

Due to the varied etiologies, many studies have had limited success in identifying genetic susceptibility factors directly related to RPL and IF (Quintero et al. 2017). In addition, the molecular complexity of each stage of the reproductive process in mammals makes it difficult to identify the genomic regions responsible for complex traits, which makes it difficult to design a strategy for selecting genes that are candidates for reproductive failure (Quintero et al. 2017).

Several studies have already been carried out analyzing the differentiated expression between maternal and fetal tissues of reproductive failures (Altmae et al, 2010; Lédée et al, 2010; Li et al, 2017; Salker et al, 2018) and the results obtained

(27)

27

showed a variety of genes with differentiated expression, such as mucins (MUC4, MUC5B), metalloproteinases (MMP8, MMP10, MMP26), cytokines and chemokines (SCBG3A1, FAM3D, FAM3B, CCR7, CXCL6, IL21, CMTM5), integrins, immunomodulators, and many affected molecular pathways, such as cell cycle, gene expression, hematological and nervous system function, cell signaling, immune response and cell assembly and organization. These data have been shown to be useful for understanding the molecular mechanisms of various processes involved in the pregnancy and its possible relationship in reproductive failures, however the wide variety of results reflects the complexity of processes involved in these conditions (Li et al, 2017).

A more complete genetic investigation of the factors related to these conditions is necessary in order to identify potential target genes strongly involved in this reproductive failure, serving as useful biomarkers for clinical practice. Bioinformatics tools, experimental data and literature present a wide range of information that can provide us with indications of genetic factors possibly involved in reproductive failure. Thus, the use of different platforms could lead us to the knowledge of the genetic bases of reproductive failures in an isolated and common way. This would represent an important tool for clinical practice, since it would facilitate the diagnosis of the cause of reproductive failure and the couple's prognosis, thus reducing costs and suffering involved in the search for conception. From this, with a personalized medicine, the management of each patient would be more accurate and efficient and it would even be possible to improve the success rates in the IVF cycles.

Using different bioinformatics approaches, the genetic factors involved in reproductive failures were evaluated, in search of defining possible candidate genes.Initially, gene-phenotype databases were searched for genes related to reproductive failures. Concomitantly, gene expression of patients with reproduction failures was evaluated, through access to the public GEO database. The genes found in both search strategies were analyzed in the String tool, assembling protein-protein networks and then, the biological processes in which they are involved were evaluated regarding Gene Ontology (GO) and KEGG Pathways.

(28)

28

2. MATERIALS AND METHODS

The methodology used was an adaptation of the study conducted by Trouvé et al, 2017 (Figure1).

Figure 1. Study design. Common genes were selected from the search in the OMIM, CTD and HuGE databases for each reproductive failure. Candidate genes were also selected after expression analysis by GEO. From these initial searches, their interactions and ontology were studied. Finally, potential candidate genes were determined in common and a search was made on their relationship with reproductive failures, in studies available at PubMed

(29)

29

2.1. Search in Databases

Initially, a search for genes potentially involved in reproductive failures was carried out in databases relating genotype-phenotype: Online Mendelian Inheritance in Man - OMIM (McKusick-Nathans, 2019), Comparative Toxicogenomics Database - CTD (Davis et al. 2019) and Human Genome Epidemiology encyclopedia – HuGE (Yu et al, 2008).

In all the databases, the used keywords were: “miscarriage”, “abortion”, “recurrent pregnancy loss” and “recurrent abortion”, for the condition “Recurrent Pregnancy Loss” and “implantation failure” and “embryo loss”, for the condition “ Implantation Failure ”. (Chart 1)

Condition Key word

Recurrent pregnancy loss

Recurrent pregnancy loss Miscarriage

Recurrent Abortion Abortion Implantation Failure Implantation Failure

Embryo Loss Chart 1. Key words used in all database search.

With the results of the search, a list of genes was made for each database and for each specific condition researched. The genes in common were then compared in two forms: the genes in common between the two conditions and the genes in common for each condition in the different databases. For each evaluation of the common genes, a Venn diagram was created using the Venny 2.1 software.

2.2. Gene expression analysis

Secondary analysis was performed using data from transcriptomes experimental tests available in the Gene Expression Omnibus project - GEO (Edgar R, et al, 2002; Barret et al, 2013). It was possible to analyze and identify the differential gene expression (DGE) in reproductive failures (RF). To evaluate the gene expression of patients with a history of RF, studies with analysis of endometrial expression in the preimplantation period of patients with a history of implantation

(30)

30

failure (IF) and/or recurrent pregnancy loss (RPL) were selected and then compared to the endometrial expression of fertile women.

Raw data were used, available in .fastq (RNA-seq) and .CEL (microarrays) formats at GEO or in the European National Archive (ENA). All data were normalized in the R environment (v.1.1.463), according to packages made available by the experimental testing platforms (Affymetrix and Illumina).

For the initial selection of studies, we used the following inclusion and exclusion criteria:

- Inclusion Criteria: only trials related to reproductive failures in humans, in previously specified tissues, registered and made publicly available in the GEO project were selected. Trials registered as “dataset” or “series” were chosen. Only studies with a clear description of the experimental design used were included.

- Exclusion Criteria: No study with samples of individuals affected by genetic syndromes and/or other conditions that may affect the expression of the genes to be evaluated was used, nor studies performed on placental tissue in order to exclude possible embryonic causes for the reproductive failure.

For the analysis, the data available in Datasets were used: GSE65102 (Lucas et al, 2015) e GSE26787 (Lédeé et al, 2011). The samples were assigned to the case or control groups.

In these analyzes, limma (Ritchie et al, 2015) and affy (Gautier et al, 2004) packages were used for microarray analyses and the edgeR package (Robinson et al, 2010; McCarthy et al 2012) for the analysis of RNAseq. All packages were available in Bioconductor repository, and were designed to analyze complex experiments involving comparisons between many RNA targets simultaneously, with the idea of adjusting to a linear model for the expression data of each gene.

Differentially expressed transcripts between normal and pathological were selected, using a p-value p<0,05. Analyses that did not match any genes were eliminated.

2.3 Analysis and comparison of data from different data sources

The results obtained in the databases and the ones obtained in the expression analysis were compared between the two reproductive failures studied.

(31)

31

Then, the genes in common between these two strategies were compared. These comparisons were made using the Venn Diagram. Thus, a list of genes in common between the two conditions was formulated for each search process performed. From these two groups of genes, the functional and interaction analysis were carried out and then compared with the data available in the literature.

2.4 Protein-protein interaction

To assess the possible interaction between the genes found, an analysis was performed in the Search Tool for the Retrieval of Interacting Genes and Proteins - STRING v.11.0 (Szklarczyk et al, 2019). The gene names were initially placed in the “multiple protein” field and the “Homo sapiens” option was selected in the “Organism” field. With the following parameters defined: only the “experiments” and “co-expression” interactions were considered, with a combined confidence level score set at ≥0.4 (thus selecting medium trust interactions, as it is an experimental trial). The interaction with more than 50 other proteins was determined. With the networks formed in STRING v.11.0, a new systems biology analysis was performed using the Cytoscape v.3.6.0 program, using the Dynet Analyzer app, in which it was possible to evaluate the network statistics and the analysis of betweenness and neighborhood of these genes. Then, an analysis of the modularity of these networks was carried out using the R environment (v.1.1.463) using the igraph package.

2.5. Functional enrichment

To assess in which biological processes the identified genes are involved, a function enrichment was carried out by the Gene Ontology Consortium (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG), using R environment (v.1.1.463) and the cluster profileR package. In this analysis, we searched for the involvement of the genes found in the analyses in processes related to human reproduction, such as embryonic implantation, homeostasis of pregnancy and embryonic-fetal development.

(32)

32

Statistical analyses were performed in R environment (v.1.1.463). Figures were assembled in R environment (v.1.1.463) through package ggplot2 (Wickham, H., 2016).

2.7. Search in Literature

Through the results obtained in each session, we selected the most relevant genes, as possible candidate genes. A search was performed in the Pubmed literature database (https://www.ncbi.nlm.nih.gov/) to identify possible studies related to the main candidate genes found.

For this research, the inclusion criteria were defined: original studies that relate genes, polymorphisms or gene expression to different types of reproductive failure in humans; and as an exclusion criterion: articles that report genes related to male infertility factors, induced abortions, case reports, reviews and meta-analyzes and articles that are not written in English.

2.8. Ethical aspects

All GEO data were submitted with the agreement of the original researchers, aware that they are public available for secondary use in analyzes and/or publications. According to information from GEO, it is the responsibility of the researcher who submitted the data to verify that the individuals in the sample gave the consent for the results of the project to be submitted to a public database, guaranteeing the privacy of each of the participants (https : //www.ncbi.nlm.nih.gov/geo). In order to carry out this research, it is requested to waive the consent form, since the data to be used are public and not identified. The researchers here are committed to the proper use of the information deposited in this database according to the proposal of this research project and sign a Term of Commitment to Use of Data, according to the institutional model, which was adapted due to the characteristics of this project.

3. RESULTS

3.1. Search in Databases

We have found a total of 671 genes for the condition “Recurrent Pregnancy Loss”. Of these 671 genes, 132, 358 and 181 were found in the OMIM, HuGE and

(33)

33

CTD banks, respectively. Even with high number of genes, 10 were common to the three databases for this specific condition (Supplementary Table 1, 2, 3 and 4).

A smaller number of genes were found for the condition “Implantation failure”. We have found a total of 34 genes. Of these genes, 9, 16 and 9 were found in the OMIM, HuGE and CTD banks, respectively, with no genes being reported concurrently in all banks (Supplementary Table 5, 6 and 7).

When comparing the results of these two searches, that is, all the genes identified in the three different databases for the two conditions, it was possible to account for a total of 26 genes related to RPL and IF (Supplementary Table 8). Among them, some genes previously described as related to RPL were already studied by our group, including MTFHR, Factor V Leiden, NOS 3 and Prothrombin (FII) (Dutra el al, 2014; Gonçalves et al, 2016).

After searching the OMIM, HuGE and CTD databases, 26 genes in common between RPL and IF were observed. These genes were compared with the genes observed in the differential expression analysis bellow (Figure 2).

3.2. Gene expression secondary analysis

Through the analysis of the data available in Dataset GSE65102, it was possible to evaluate the DGE of endometrial tissue of 10 patients with a history of RPL (case) and compared with endometrial biopsy of 10 patients without a history of reproductive failure (control). A total of 284 genes with differentiated expression were obtained, with 33 genes overexpressed in the case group and 251 genes with low expression (Supplementary Table 9).

In the analysis of data available in Dataset GSE26787, the DGE of endometrial tissue were evaluated in 5 patients with a history of RPL and in 5 patients with a history of IF and when compared with endometrial biopsy of 5 patients with no history of reproductive failure (control). A list of 138 differentially expressed genes was generated in the RPL group, when compared with the control group. Among these, 54 genes were overexpressed in the first group, while 84 were under expressed (Supplementary Table 10). In the IF group, a list of 273 genes with differential expression was generated, with 120 genes overexpressed in patients

(34)

34

with reproductive failure and 153 genes with low expression (Supplementary Table 11).

Taken both studies together, a total of 422 genes with DGE were observed in patients with RPL when compared to fertile patients and a total of 148 genes with DGE in patients with IF (Supplementary Table 12).

3.3. Analysis and comparison of data from different data sources

From the list of genes obtained in the previous analysis, it was possible to compare the results obtained in different search strategies (i.e. search in genome-phenotype databases and Gene Expression analysis in GEO) and the results obtained for the two reproductive failures (i.e. RPL and IF).

As detailed in Figure 2, comparing the results obtained from the databases, RPL and IF share a total of 26 genes in common (Supplementary Table 8). In the expression analysis, it is possible to observe a total of 148 genes in common between the two reproductive conditions (Supplementary Table 12).

Then, after the analysis and comparison of the results obtained in the databases, a total of 26 genes in common were observed between patients with RPL and IF through Genome-phenotype database and a total of 148 in common between patients with RPL and IF through Gene Expression Analysis. No genes in common were observed between this two groups of genes (26 and 148).

(35)

35

Figure 2. Flowchart of results. Comparison of the results of the databases searches between RPL X IF (red), observing a total of 26 genes in common. Comparison of the results of the analysis of differential gene expression between RPL X IF (blue), observing a total of 148 genes in common. Comparison of these two new groups (red x blue), not observing any gene in common (green).

3.4. Protein-protein interaction

Using the STRING program, the direct and indirect interaction of the 26 genes in common between RPL and IF obtained from OMIM, HuGE and CTD databases were analyzed (Figure 3A). It was possible to observe the direct or indirect connection of only 7 of the 26 genes surveyed, which means an interaction of only 27% of the analyzed genes.

We next evaluated the interaction of the 148 genes in common between RPL and IF obtained from the differential expression analysis (Figure 3B ). We observed that only 10 of the 148 genes had a direct or indirect connection, which means only 6% of the analyzed genes.

When evaluating the interaction network of these two groups together (i.e. ploting in the same network the 26 and 148 genes), it was possible to observe the interaction of only 16 of the 174 genes, about only 10% of the genes analyzed (Figure 3C).

This low interaction between the genes identified in the two search strategies reflects the molecular complexity of all mechanisms involved in reproductive processes.Even with different functions and often involved in unrelated mechanisms, these genes are linked to a network of essential processes for a healthy pregnancy. It is necessary to evaluate the formation of this interaction network, the importance of each gene in this construction and its relationship with the other genes, in order to then determine possible targets, which, when affected, impair the functioning of a large cascade of processes.

(36)
(37)
(38)

38

Figure 3. Interaction network. The Network nodes represent proteins. Query proteins and first shell of interactor being colored and second shell of

(39)

39

interactors being white nodes. The Edges represent protein-protein associations. A)Interaction network of the 26 genes in common between RPL and IF obtained from OMIM, HuGE and CTD databases. A direct or indirect link is observed between only 7 of the 26 genes surveyed (colored dots). B) Interaction network of the 148 genes in common between RPL and IF according to the differentiated expression analyzes performed at GEO. C) Interaction network of the 26 genes in common according to databases search plus the 148 genes in common according to GEO.

3.5. Functional analysis

To look for the biological processes in which all of these genes are involved, the 26 genes in common between RPL and IF, according to database search and the 148 in common according to expression analyzes were evaluated using the Gene Ontology Consortium (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG), separately.

First, we analyzed the main ontologies of the 26 genes in common for RPL and IF from OMIM, HuGE and CTD databases (Figure 3A). As can be observed in Figure 4, these 26 genes are mainly involved in the inflammatory response, JAK-STAT cascade, cellular response to toxic substances and coagulation, according to KEGG (Figure 4A). According to the GO, the main ontologies are related to coagulation, fluid regulation, response to oxidative stress and toxic substances and smooth muscle cell proliferation (Figure 4B).

The same analysis was performed for the 148 genes in common between RPL and IF in the analysis of differentiated expression. According to the GO, the main ontologies to which these genes are involved are related to the transport of ions and carboxylic acid in the cell (Figure 4C). KEGG did not identify any statistically significant ontology to which these genes are related.

(40)

40

Figure 4. Functional Analysis. A) KEEG enrichment for the 26 common genes between RPL and IF in the search in databases. B) Gene Ontology enrichment for for the 26 common genes between RPL and IF in the search in databases. C) Gene Ontology enrichment for the 148 common genes between RPL and IF in the gene expression analysis.

3.6. Protein-protein interaction and Functional Analysis of all genes with differential expression

As it was not possible to observe common ontologies for the genes with differential expression in both RPL and IF, we decided to analyze all differentially expressed genes in Dataset GSE26787 and GSE65102, regardless of whether are in common between the two reproductive failures or not. To this we performed an interaction analysis and functional enrichment of all genes differentially expressed in the two expression studies analyzed (GSE26787 and GSE65102;Figure 5A) .

(41)

41

After comparing all the networks simultaneously, only the genes with interactions were selected and, thus, their functional enrichment was performed. Through the enrichment of these genes, both the GO and the KEGG identified ontologies, mostly involved in the cell division process, inflammatory response and ribosome production (Figure 5) . As already mentioned, these biologic processes are extremely important for a correct embryonic implantation, endometrial invasion and cell proliferation.

(42)

42

Figure 5 Protein-protein interaction and Functional Analysis of all genes with differential expression . A). Interaction network formed from all genes with DGE in GEO (GSE26787 and GSE65102), analyzed in Cytoscape v.3.6.0 . B) KEEG enrichment for all differentially expressed genes. C). Gene Ontology enrichment for all differentially expressed genes.

When evaluating the pathways of the biological processes involved, it is possible to observe a significant part of the whole mechanism where these genes affect directly and/or indirectly. As shown in Supplementary Figure 1A, many of these genes act at different stages of the cell cycle, affecting all phases of the process (G1, S, G2 and M). In addition, sometimes these genes are in an upstream position in the biological process and, once with their differentiated expression; end up influencing in the entire subsequent cascade (Supplementary Figure 1B).

3.6. Selecting candidate genes

Through the results obtained in previous sessions, we selected the most relevant genes in each analysis, as possible candidate genes. The relevance of the genes was determined by the following criteria:

(43)

43

First, analyzing the interaction networks formed from of all genes differentially expressed in the two expression studies analyzed (Figure 5A), we select the most relevant genes, taking into account their position in the formation of the network.

The genes that presented themselves as connector hubs, that is, that intermediated connections from different clusters (a group of genes which collectively share a generalized function) , within the network itself, were selected, because they have a considerable importance in the construction of the network, so in the case of the absence or non-functioning of these genes, the entire construction is compromised.

Then, the differential expression of the genes was analyzed, according to Datasets GSE26787 and GSE65120. The genes considered as connector hubs and the genes that had interactions in the networks and had differencial expression, at the same time, were selected, as detailed in Figure 8.

As possible candidates genes in this step, we selected: HMGCS1, ACTA2,

ANAPC10, ANG, CCT5, CDC25A, DNAJC6, DNAJC8, EGFR, ELP3, GALNT13, GMPR, GNG11, GRIA3, KCNG1, LMOD1, LRIG3, MPHOSPH10, MYL9, NMD3, NNMT, NOP58, PABPC1L, PCNA, PLA2G2A, PPARGC1A, RGS7, S100A4,

SAPCD2, SOCS3, SORBS1, SQLE, TGFA, THBS2, TM4SF4, UNC5B and UTP15 (genes that had interactions in the networks and had differencial expression) and HMGCS1, ELP3, NMD3, UTP15, NOP58, CCT5, ACTA2, EGFR, PCNA, ANAPC10 and DNAJC8 (genes considered as connector hubs).

(44)

44

Figure 8- Interaction network of all differentially expressed genes after expression analysis. In blue, the genes considered connector hubs. In yellow, the differentially expressed genes interacting in the network.

The next step was to determine the importance of each gene to the construction of the network and how they interact with each other. For this, a statistical analysis was performed on the network formed by all differentially expressed genes (Figure 5). In this sense, Intermediation and Neighborhood Analysis of genes and Modularity Analysis were performed.In this analysis, it was possible to identify which genes in the network had the highest amount of information and which genes had the highest number of interactions.

As detailed in Figure 9, the genes with the highest number of interactions (warmest colors) are CCT5, CDC20, PCNA, NOP58 and NOP56, and the genes with the with the greatest amount of information (higher volume) are EGFR, ACTA2,

ANXA2,TGFA.

Then, we selected the genes with the greatest number of interaction and with the greatest amount of information as possible candidates genes, which are: EGFR,

(45)

45

Figure 9. Intermediation and Neighborhood Analysis of all differentially expressed genes after expression analysis.

The differentiation in the number of interactions occurs by color, where the genes with the highest interactions are represented by warm colors, so the genes with the highest number of interactions have the warmest colors (yellow or orange), while the genes with the least interactions have the coolest colors (green or blue). The flow of information is given by volume, where the genes in which the highest amount of information is passed are the largest, so genes through which most of the information on the network passes have a higher volume, while the genes where not so much information passes have a lower volume.

Through community analysis, we estimated the network modularities and identified the genes that represent the main connections of the network. That is, those that, if removed, will result in the complete loss of connectivity between groups of nodes, greatly affecting the overall topology of the interaction network and, as a result, the propagation of information on the network is largely lost. This evaluation is carried out by the number of genetic interactions, named Modularity Hub Analysis ( Supplementary Figure 2A), or by the betweenness centrality / importance of a gene in the network, called Modularity Analysis Authority (Supplementary Figure 2B ).

In this analysis, it was possible, through another approach, to determine the genes that are main in the construction of the network and, therefore, more important for the associated biological process.

As candidate genes, we select the genes with the highest number of interactions, through the Modularity Hub Analysis (MHA) and the genes with the highest centrality through the Modularity Authority Analysis (MAA), which are NEK2,

DIAPH3, FZR1, BUB1, CDC25A, CKS1B, CDK1, KIF20A, PBK, HIURP, NUSAP1, CCNA2 and CDC 22 and KIF11, CCNA2, DLGPA5, NACPG, NEK2, NDC80, FZR1,

K1F4A, TCP2A, DEROC1 and DIAPH3 respectively.

As potential candidates for biomarkers of the reproductive failures studied, we compared the genes of highest relevance in each stage of analysis performed and then selected those mentioned in at least two of the methods performed. That

(46)

46

is, connector hubs genes which had differentiated expression, genes of highest relevance in the Analysis of Intermediation and Neighborhood and genes of highest relevance in the Analysis of Modularity Hub and Authority were compared. As a result of this comparison, the more potential relevant biomarkers for reproductive failures were the following candidates genes: PCNA, NOP58, EGFR, ACTA2,

CCT5, ANAPC10, NMD3, DNAJC8, ELP3, UTP15, HMGCS1, TGFA, CDC25A, DIAPH3, CCNA2, NEK2 and FZR1.

3.6. Literature Search

We then looked at the literature for the relationship between these genes (PCNA, NOP58, EGFR, ACTA2, CCT5, ANAPC10, NMD3, DNAJC8, ELP3, UTP15,

HMGCS1, TGFA, CDC25A, DIAPH3, CCNA2, NEK2 and FZR1) and the

reproductive failures.

Research was carried out on these genes through GeneCards - The Human Gene Database. Afterwards, the type of expression of these genes in different reproductive tissues, blood and saliva was analyzed using Expression Atlas - Gene expression across species and biological conditions. Finally, a possible relationship between them and reproductive failures in PubMed. We searched for the name of the gene AND the same key words used in Database Search (Chart 1).

After searching the literature, among the of these possible candidates, the genes PCNA, EGFR, TGFA and FZR1 have been related with reproductive failures (Meresman et al, 2010, Tazuke et al, 1996, Hofman et al, 1991; Hofmann et al, 1993; Slater et al, 2000; Holt et al, 2014; Seah et al, 2012) (Table 2).

PCNA (Proliferating Cell Nuclear Antigen), is already widely used as a marker

of cell proliferation (Turner et al, 1999) and studies have already correlated its low expression with infertile patients and with a history of RPL (Meresman et al, 2010). There is also the relationship of this gene with other reproductive failures, such as gestational trophoblastic disease (Kale et al, 2001).

The expression of EGFR (Epidermal Growth Factor Receptor) and other growth factors and their receptors in the uterus during the pre-implantation period (Tazuke et al, 1996) and in deciduous and trophoblastic cells (Hofman et al, 1991) suggest that they can affect embryonic implantation in several different ways,

(47)

47

playing a critical role in trophoblast invasion, cell differentiation and proliferation (Bass et al, 1994; Li et al, 1997).

In humans, TGFA (Transforming Growth Factor Alpha) expression has already been demonstrated in the endometrium, placenta and decidua (Lysiak, et al, 1993). In addition, the presence of this gene has already been observed in syncytiotrafoblast and cytotrophoblast cells, indicating its relationship with trophoblastic invasion and embryonic implantation in an autocrine, paracrine or justacrine manner (Massagui et al, 1990). In addition to its role in invasion, proliferation and decidualization processes (Slater et al, 2000), this gene has also been related to embryonic development (Adamson, et al 1993).

In animal models, the gene FZR1 (Fizzy and Cell Division Cycle 20 Related 1) demonstrated a direct association with the proliferation of male and female germ cells, thus acting on their reproductive health (Holt et al, 2014). In another study, still using mice, knocking out this gene in embryos in the pre-implantation phase led to a drastic loss of genome integrity and interruption of embryonic development (Seah et al, 2012)

The others, although not previously related to these reproductive failures, act directly in biological processes involved in embryonic implantation and pregnancy (Table 2).

According to GeneCards, genes such as ANAPC10, DNAJC8, CDC25A, CCNA2 and NEK2 have great importance in the regulation of the cell cycle, acting directly in the balance between cell proliferation and apoptosis. The ACTA2 gene acts directly on the tissue vascularization process. While genes such as NOP58, ELP3 and UTP15, act directly or indirectly in the protein transcription and translation process. Although all of these steps are known to be essential for embryonic implantation and the success of a pregnancy, none of these genes has yet been studied for their relationship with reproductive failures in humans.

According with Expression Atlas all possible candidates have expression in reproductive tissues and, with the exception of the EGFR, CDC25A and DIAPH3 gene, all have expression in saliva and blood, which would facilitate obtaining a sample for screening (Supplementary Figure 3).

Referências

Documentos relacionados

Peça de mão de alta rotação pneumática com sistema Push Button (botão para remoção de broca), podendo apresentar passagem dupla de ar e acoplamento para engate rápido

Em relação à situação actual, mesmo depois do apoio dos vários programas de ajuda à reconversão, constata-se que ainda mais de 50 % das vinhas se encontram implantada

Uma das explicações para a não utilização dos recursos do Fundo foi devido ao processo de reconstrução dos países europeus, e devido ao grande fluxo de capitais no

Neste trabalho o objetivo central foi a ampliação e adequação do procedimento e programa computacional baseado no programa comercial MSC.PATRAN, para a geração automática de modelos

Ousasse apontar algumas hipóteses para a solução desse problema público a partir do exposto dos autores usados como base para fundamentação teórica, da análise dos dados

The fourth generation of sinkholes is connected with the older Đulin ponor-Medvedica cave system and collects the water which appears deeper in the cave as permanent

Extinction with social support is blocked by the protein synthesis inhibitors anisomycin and rapamycin and by the inhibitor of gene expression 5,6-dichloro-1- β-

Para determinar o teor em água, a fonte emite neutrões, quer a partir da superfície do terreno (“transmissão indireta”), quer a partir do interior do mesmo