• Nenhum resultado encontrado

Genetic and statistical study of HIV integration in the human genome

N/A
N/A
Protected

Academic year: 2021

Share "Genetic and statistical study of HIV integration in the human genome"

Copied!
5
0
0

Texto

(1)

AIP Conference Proceedings 1558, 813 (2013); https://doi.org/10.1063/1.4825619 1558, 813 © 2013 AIP Publishing LLC.

Genetic and statistical study of HIV

integration in the human genome

Cite as: AIP Conference Proceedings 1558, 813 (2013); https://doi.org/10.1063/1.4825619 Published Online: 17 October 2013

Inês J. Sequeira, Juliana Gonçalves, Elsa Moreira, João T. Mexia, José Rueff, and Aldina Brás

ARTICLES YOU MAY BE INTERESTED IN

Song learning in humpback whales: Lessons from song hybridization events during revolutionary song change

The Journal of the Acoustical Society of America 140, 3416 (2016); https:// doi.org/10.1121/1.4970982

A hybrid impedance measurement method to inversely determine the impedance of finite porous absorber material samples

The Journal of the Acoustical Society of America 140, 3174 (2016); https:// doi.org/10.1121/1.4969973

Evolutionary optimization of long short-term memory neural network language model

The Journal of the Acoustical Society of America 140, 3062 (2016); https:// doi.org/10.1121/1.4969532

(2)

Genetic and Statistical Study of HIV Integration in the

Human Genome

Inês J. Sequeira

a

, Juliana Gonçalves

b

, Elsa Moreira

a

, João T. Mexia

a

, José Rueff

b

and Aldina Brás

b

a

Department of Mathematics, Faculty of Sciences and Technology, CMA, Universidade Nova de Lisboa, Quinta da Torre, 2829-516 Caparica, PORTUGAL.

b

Department of Genetics, Faculty of Medical Sciences, CIGMH, Universidade Nova de Lisboa, Rua da Junqueira 100, 1349-008 Lisbon, PORTUGAL.

Abstract. Integration of the human immunodeficiency virus (HIV) DNA into human genome is essential for

HIV-induced disease. The human genome is organized into chromosomes and within these we can define the chromosomal fragile sites. Our aim is to contribute to help clarifying the integration sites preferences of HIV1 and HIV2 in fragile or non-fragile regions. Here we apply statistical techniques, namely non-parametric tests and analysis of variance for analyzing two sets of data of HIV1 and HIV2 integrations in the human genome. The results show that the integrations occur significantly with more intensity in the non-fragile regions of the human genome and that the HIV1 in particular has the major contribution to this fact. This study could have implications in human disease.

Keywords: ANOVA, Human Genome, HIV, non-parametric tests AMS: 62P10, 62G10, 62J10

INTRODUCTION

The human immunodeficiency virus (HIV) to complete its life cycle, has to integrate its genome into the human genome. The human genome is organized into chromosomes, 22 pairs of somatic homologous elements plus two sex-determining. Within chromosomes, we can define the chromosomal fragile sites (CFSs) that are loci or regions susceptible to spontaneous or induced occurrence of breaks, rearrangements and viral integrations.

Although human papillomavirus types 16 and 18, hepatitis B virus and the Epstein-Barr virus integrate preferentially in CFSs [1-5], the relation between HIV and CFSs has not yet been clarified. Differences between HIV1 and HIV2 integration sites can justify the diversity of its pathogenesis [6].

Statistical methods can help to clarify the integration site preferences of HIV1 and HIV2.

Thereby, our main objective is to compare HIV integration sites in CFSs versus non-CFSs. The study of HIV integration sites in human genome may facilitate the understanding of the disease itself and treatment.

DATA AND METHODS

Human genome was divided in fragile regions (FRs) and non fragile regions (non-FRs) as follows. Two sequential bands associated with fragile sites were grouped together to form a FR [7]. A region between two separate FRs is considered a non-FR. The complete list of fragile sites was obtained from Mrasek et. al. [8]. The genomic positions of FRs and non-FRs were determined using NCBI Map Viewer database (Build 37.2).

The sequences of HIV1 and HIV2 were obtained from NCBI GenBank database using the accession numbers from Mitchell et. al. [9] and from MacNeil et. al. [6], respectively. Then, using BLAST (Basic Local Alignment Search Tool) the sequences were located in human chromosomes and so the starting and ending positions of integrations were obtained. For HIV1 and HIV2 we have respectively 140 and 132 viral integrations.

Integration sites were co-located with FRs or non-FRs according to their positions.

The Y chromosome was not considered because it does not have fragile regions clearly defined.

Resuming the available data of HIV1 and HIV2 integrations in human genome, for each of the 23 chromosomes, we had the number of the viral integrations and its starting and ending positions. Then, we could compute the length

(3)

of each viral integration. The number of integrations available per chromosomes was not the same. The integrations were classified into two groups: yes if they fall into FRs and no if not (non-FRs).

In first place, our aim is to compare the integrations in FRs versus non-FRs for HIV1 and HIV2 separately, chromosome by chromosome, to see in which region the virus integrates more frequently. However, since the regions have different lengths, simply compare the number of integrations between FRs and non-FRs will induce to misleading results. So, we defined the intensities in number of viral integration for a chromosome as the frequency of integrations occurring in the region, FRs and non-FRs, weighted by the respective length and they are given by

FR yes FR

l

n

i

, FR non no FR non

l

n

i

  ,

where

n

yes

(

n

no

)

is the number of the viral integrations in the FRs (non-FRs) and

l

FR

(

l

nonFR

)

is the total length of the FRs (non-FRs) in that chromosome. Moreover, we also decided to define a different type of measure considering the extension of the integration instead of the number. This measure is defined as the ratios

FR yes FR

l

l

r

, FR non no FR non

l

l

r

  ,

where

l

yes is the length of the viral integrations in the FRs and

l

no is the length of viral integrations in the non-FRs in that chromosome. For each chromosome and measure we have one pair (x,y) in which x is the measure in the FRs and y is the measure in the non-FRs. To accomplish our aim, we used these pairs to obtain a graphical representation (Figures 1 and 2) locating each chromosome with a point. We plotted the straight line y=x to see if we had more chromosomes with y > x or more chromosomes with y < x.

In order to statistically compare the measures of FRs with non-FRs, two different non-parametric tests were used, the sign rank test and the Wilcoxon test [10]. These two tests are appropriate because they are meant for comparing dependent samples, which is the current case. The pairs (x,y) were computed from the same sample of viral integrations in human chromosomes, so they are obviously dependent. The second test, Wilcoxon test, is more powerful than the sign rank test because it allows extracting more information from the data in the sense that it accounts for the amplitude of the differences between the two variables [10]. The two tests are similar and serve to confirm each other results. In case of contradiction, the Wilcoxon test result is taken, because of the motive exposed before.

In a second phase, a different approach was used. We decided to analyze jointly the two sets for HIV1 and HIV2 data integrations in human genome, with the aim of detecting statistical differences between the behaviors of the two virus relative to integrating in FRs and non-FRs. With this intent we used a two-way analysis of variance (ANOVA) [11], where the two factors considered to assess the influence on the response variable were the virus and the region, with two levels each one, HIV1 and HIV2, and FRs and non-FRs respectively. In ANOVA nomenclature this leads to 4 treatments: HIV1xFRs, HIV1xnon-FRs, HIV2xFRs, HIV2xnon-FRs; for which we made to correspond the samples containing the intensities in number computed in the first phase. Therefore, for the purpose of this ANOVA, the intensities of integration are considered as the response variable. For each of the 4 treatments, we have 23 intensities corresponding to the 23 chromosomes. However, since the number of observations per chromosome differs, these intensities cannot be considered as replicates in the same treatment. In order to overcome this issue we computed a different type of intensity that considers the proportion instead of the number, given by

yes yes no FR FR

n

n

n

i

l



, no yes no non FR non FR

n

n

n

i

l

 



.

In this way, we can use these intensities as replicates, since they can be considered as different measurements of the same measure taken in similar conditions. The fact of considering replicates allows us to test also the influence of the interaction between the two factors: virus and region on the intensity of integration. Moreover, because there is the same number of observations per treatment, 23 intensities, a balanced data condition is verified; thus the ANOVA is robust for departures from normality and even from homoscedasticity assumptions [12 and 13]. Therefore, the analysis of variance can be used to compare intensities of integration, since the normality and independence of the response variable can be assumed.

(4)

RESULTS AND CONCLUSIONS

Figure 1 suggests that HIV1 integration occurs more in non-FRs since there is a higher number of points above the line y=x. The results of sign test clearly reject the hypothesis of equal intensities at 1% level significance (Table 1). Using the Wilcoxon test the decision was the same, since the test statistic Tobs does not exceeds the critical value, 62 for n=23 and =1% (Table 1). Regarding the HIV2 integration, Figure 2 suggests that there is no significant difference between the integration in FRs and in non-FRs. The sign and Wilcoxon tests both agree in not rejecting the hypothesis of equal intensities at 1% level significance, thus both are consistent with no significant difference on HIV2 integration in the two regions (Table 1). We point out that the results obtained were the same either we used the intensity in number or the extension ratio.

FIGURE 1: (a) Representation of the extension ratio of HIV1 integration for FRs vs. extension ratio for non-FRs. (b)

Representation of the intensity number of HIV1 integration for FRs vs. intensity number for non-FRs. Each point represents one chromosome whose coordinates are the values of the respective measure for FRs and non-FRs.

FIGURE 2: (a) Representation of the extension ratio of HIV2 integration for FRs vs. extension ratio for non-FRs. (b)

Representation of the intensity number of HIV2 integration for FRs vs. intensity number for non-FRs. Each point represents one chromosome whose coordinates are the values of the respective measure for FRs and non-FRs.

TABLE 1 Results for the sign and Wilcoxon test

p-value of the sign test Tobs of the Wilcoxon test

intensity number extension ratio intensity number extension ratio HIV1 0.005 0.001 50 53 HIV2 0.339 0.202 127 113 (b) (a) (a) (b)

(5)

The results obtained from the application of the two-way ANOVA on the intensities of integration are presented in Table 2. We would like just to point out that to perform the ANOVA calculations, the intensities were multiplied by 108 in order to obtain the results for the sum of squares (SS) and mean squares (MS) in a larger scale, being the resulting value for the F statistic the same. These results show significant values of F statistic for the factor Region and for the interaction Virus x Region, but no significant value for the factor Virus. This can be interpreted as follows: the factor Region influences significantly the intensity of the viral integration, unlike the factor Virus. In particular, when looking at Table 3, presenting the average intensities of the 4 treatments and by factor, we see that the intensity of integration in the non-FRs is almost the double of the FRs (1.2546 against 0.6741). As for the significant interaction Virus x Region, this means that a particular level of the factor Virus combined with a particular level of the factor Region influences the intensity of integration. In fact when looking at Table 3, we see the HIV1 in non-FRs has an increased average intensity of integration when compared with the other three.

TABLE 2 ANOVA summary TABLE 3 Means by treatment Source of variation SS df MS F statistic Quantil F1,88,5%

Virus 0.11 1 0.11 0.14

Region 7.75 1 7.75 9.98 3.95 Virus x Region 3.71 1 3.71 4.78

Error 68.32 88 0.78 Total 79.89 91

As conclusions from this ANOVA two-way we can say that the integrations occur significantly with more intensity in the non-FRs of the human genome and that the HIV1 in particular has the major contribution to this fact.

These results are in line with the results of the sign and Wilcoxon tests in the first phase, which indicates that the HIV1 prefers the non-FRs and the HIV2 has no preferences.

These findings differ from those found for other viruses (see introduction) and may be due to both virus- and/or human genome specific features. However, since our study is based on statistical calculations, it is important to emphasize that other variables, acting in vivo, could affect the HIV integration site selection. A wealth of factors governing the genomic activities may be taken into account in order to obtain an accurate view of the process of integration.

ACKNOWLEDGMENTS

This work was partially supported by CIGMH (FCM/UNL), PEST-OE/SAU/UI0009/2011 and CMA/FCT/UNL, under the project PEst-OE/MAT/UI0297/2011.

REFERENCES

1. E. C. Thorland, S. L. Myers, B. S. Gostout and D. I. Smith, Oncogene 22, 8 (2003).

2. M. Matovina, I. Sabol, G. Grubisic, N. M. Gasperov and M. Grce, Gynecol Oncol. 113, 1 (2009).

3. I. Kraus, C. Driesch, S. Vinokurova, E. Hovig, A. Schneider, M. von Knebel Doeberitz and M. Durst, Cancer Res. 68, 7 (2008).

4. M. A. Feitelson and J. Lee, Cancer Lett. 252, 2 (2007).

5. W. J. Luo, T. Takakuwa, M. F. Ham, N. Wada, A. Liu, S. Fujita, E. Sakane-Ishikawa and K. Aozasa, Lab. Invest. 84, 9 (2004).

6. A. Macneil, J. L. Sankalé, S. T. Meloni, A. D. Sarr, S. Mboup and P. Kanki, J. Virol. 80, 15 (2006). 7. A. Laganá, F. Russo, C. Sismeiro, R. Giugno, A. Pulvirenti and A. Ferro, PLosONE 5, 6 (2010).

8. K. Mrasek, C. Schoder, A.C. Teichmann, K. Behir, B. Franze, K. Wilhelm, N. Blaurock, U. Claussen, T. Liehr and A. Weise,

Int. J. Oncol. 36, 4 (2010).

9. R. S. Mitchell, B. F. Beitzel, A. R. W. Schroder, P. Shinn, H. Chen, C. C. Berry, J. R. Ecker and F. D. Bushman, PLOS Biol. 2, 8 (2004).

10. S. Siegel and N. John Castellan, Nonparametric Statistics for the Behavioral Sciences, 2nd edition, Singapore: McGraw-Hill Statistics Series, 1988.

11. R. Hocking, Methods and Applications of Linear Models, New York: John Willey &Sons, 2003.

12. K. Ito, Robusteness of ANOVA and MANOVA test procedures, In: Krishnaiah PR (ed), Handbook of Statistics, Vol. 1, Amsterdam: North-Holland Publishing Company, 1980.

13. H. Scheffé, The Analysis of Variance, New York: John Willey & Sons, 1959.

FRs non-FRs Total

HIV1 0.5075 1.4898 0.9986 HIV2 0.8408 1.0194 0.9301 Total 0.6741 1.2546 0.9644

Imagem

Figure 1 suggests that HIV1 integration occurs more in non-FRs since there is a higher number of points above  the line y=x

Referências

Documentos relacionados

Peça de mão de alta rotação pneumática com sistema Push Button (botão para remoção de broca), podendo apresentar passagem dupla de ar e acoplamento para engate rápido

Despercebido: não visto, não notado, não observado, ignorado.. Não me passou despercebido

Caso utilizado em neonato (recém-nascido), deverá ser utilizado para reconstituição do produto apenas água para injeção e o frasco do diluente não deve ser

Neste artigo, apresentamos um algoritmo de navegação robótica para a localização de um determinado nó sensor gerando um evento de interesse a partir do

Neste trabalho o objetivo central foi a ampliação e adequação do procedimento e programa computacional baseado no programa comercial MSC.PATRAN, para a geração automática de modelos

Ousasse apontar algumas hipóteses para a solução desse problema público a partir do exposto dos autores usados como base para fundamentação teórica, da análise dos dados

Ainda assim, sempre que possível, faça você mesmo sua granola, mistu- rando aveia, linhaça, chia, amêndoas, castanhas, nozes e frutas secas.. Cuidado ao comprar

financeiras, como ainda por se não ter chegado a concluo entendimento quanto à demolição a Igreja de S. Bento da Ave-Maria, os trabalhos haviam entrado numa fase