• Nenhum resultado encontrado

7   Conclusões e trabalho futuro

7.2   Trabalho futuro 150

A análise de determinados genes, do ponto de vista das repetições, poderá contribuir para orientar o medicamento ao paciente e não para a doença, princípio que já está a ser discutido actualmente. O Canadian Institutes of Health Research apresentou em [186] o seu plano para avançar com a pesquisa orientada para o paciente. Esse plano foca-se em diversos aspectos da saúde pública, nomeadamente:

 A maior ou menor predisposição de um indivíduo em correr risco de doença (incluindo a compreensão dos factores genéticos, descoberta de novos marcadores biológicos, etc.).

 Acelerar e melhorar os mecanismos de rastreio e diagnóstico.  Prognóstico do paciente quando sujeito a determinadas condições.

 Procurar as melhores estratégias de forma a orientar a terapia para o paciente.

Neste campo, a integração de sistemas de informação suportados em bases de dados públicas poderão constituir um grande desafio, nomeadamente na constituição de bases de dados de DNA, usadas para múltiplos fins (medicinais, criminológicos, biotecnológicos, entre outros), sendo obviamente necessário software à medida para efectuar as tarefas de análise especializada.

As repetições de CAG no gene humano HTT que são responsáveis pela doença de Huntington [152, 187-188] quando essas repetições ultrapassam o valor 35 (normal entre 10 e 34) poderão vir a ser detectadas pela análise do património genético do indivíduo.

151

Dessa forma, poderá determinar-se se os valores obtidos lhe conferem ou não uma maior susceptibilidade para doença, bem como para os seus descendentes [9], pelo que mais uma vez os sistemas de informação continuarão a constituir o pilar necessário para suportar essa análise.

Outro campo de aplicação que poderá ser explorado neste contexto, refere-se à mineração de dados. Actualmente várias ferramentas fazem mineração de dados genómicos, nomeadamente na detecção de padrões de segmentos de DNA que desempenham uma determinada função. Refiro-me mais especificamente a Motifs, que não sendo uma área totalmente nova, possui ainda um potencial bastante grande a nível computacional [189]. A inferência de regras de associação entre Motifs, a criação de ferramentas de Clustering, entre outras metodologias poderão contribuir num futuro para detectar atempadamente a premonição para determinados problemas de saúde, quer ao nível do indivíduo, quer de saúde pública. Para esse efeito terão necessariamente de ser produzidos equipamentos bastante poderosos, não apenas de sequenciação dos genomas humanos, mas também para análise dos dados em tempo útil, o que actualmente, só é possível num grupo muito restrito de comunidades de investigação internacionais. Esse acesso restrito à tecnologia a par de várias barreiras legislativas, levam a que nem sempre se consiga ter acesso a dados reais para efectuar testes dos algoritmos implementados, usando em muitos casos, dados públicos, já trabalhados por terceiros, que condicionam o valor dos resultados obtidos. Estes são alguns dos exemplos sobre os quais as aplicações desenvolvidas poderão ser utilizadas, quer recorrendo às implementações actuais, quer pela inclusão de módulos específicos para esse fim.

Por último, de referir que os algoritmos desenvolvidos, principalmente para detecção de sequências exactas e aproximadas, poderão vir a ser alvo de refinamentos e optimização, nomeadamente pela inclusão de métodos baseados em arrays de sufixos.

153

Referências

[1] F. Sanger, et al., "The nucleotide sequence of bacteriophage [phi] X174," Journal

of molecular biology, vol. 125, pp. 225-246, 1978.

[2] J. Barrett, et al., "Genome-wide association defines more than 30 distinct

susceptibility loci for Crohn's disease," Nature genetics, vol. 40, pp. 955-962, 2008.

[3] M. Morley, et al., "Genetic analysis of genome-wide variation in human gene

expression," NATURE, vol. 430, pp. 743-747, 2004.

[4] H. Rheinberger, et al., "Three tRNA binding sites on Escherichia coli ribosomes,"

Proc Natl Acad Sci USA, vol. 78, pp. 5310 - 5314, 1981.

[5] M. Fardilha, et al., "A importância do mecanismo de “splicing” alternativo para a

identificação de novos alvos terapêuticos," Acta Urológica, vol. 25, pp. 39-47, 2008.

[6] F. Lee. (2010, 28-08-2010). Molecular Biology Web Book. Available: http://www.web-books.com/MoBio/Free/Ch7F3.htm

[7] K. A. Freed, et al., "Detection of CAG repeats in pre-eclampsia/eclampsia using the

repeat expansion detection method," Mol. Hum. Reprod., vol. 11, pp. 481-487, July 1, 2005 2005.

[8] P. Ferro, et al., "The androgen receptor CAG repeat: a modifier of

carcinogenesis?," Molecular and Cellular Endocrinology, vol. 193, pp. 109-120, 2002.

[9] Pearson. (2007, 9-02-2009). Repeat Disease Database. Available: http://www.cepearsonlab.com/rdd.php

[10] S. Subramanian, et al., "Triplet repeats in human genome: distribution and their

association with genes and other genomic regions," bioinformatics, vol. 19, p. 549, 2003.

[11] Y. Haberman, et al., "Trinucleotide repeats are prevalent among cancer-related

genes," Trends in Genetics, vol. 24, pp. 14-18, 2008.

[12] A. Chapman, "England's Leonardo: Robert Hooke (1635-1703) and the art of experiment in Restoration England," 1996, pp. 239-276.

[13] G. Mendel, Experiments in plant hybridisation: Cosimo, Inc., 2008.

[14] L. Wong, The practical bioinformatician: World Scientific Pub Co Inc, 2004.

[15] A. Nakabachi, et al., "The 160-Kilobase Genome of the Bacterial Endosymbiont

Carsonella," Science, vol. 314, pp. 267-, October 13, 2006 2006.

[16] S. G. Gregory, et al., "A physical map of the mouse genome," NATURE, vol. 418, pp. 743-50, Aug 15 2002.

154

[17] P. V. Baranov, et al., "Codon size reduction as the origin of the triplet genetic code," PLoS One, vol. 4, p. e5708, 2009.

[18] Delgado, Jr., "The genial gene: deconstructing Darwinian selfishness," Choice:

Current Reviews for Academic Libraries, vol. 47, pp. 135-135, 2009.

[19] T. G. Boyer, et al., "Genome mining for human cancer genes: whereforeartthou?,"

Trends in Molecular Medicine, vol. 7, pp. 187-189, 2001.

[20] I. Rigoutsos and G. Stephanopoulos, Systems Biology. Volume I: Genomics: Oxford University Press, 2006.

[21] J.-M. Claverie, "GENE NUMBER: What If There Are Only 30,000 Human Genes?," Science, vol. 291, pp. 1255-1257, February 16, 2001 2001.

[22] L. Wong, THE PRACTICAL BIOINFORMATICIAN (duplicado): World Scientific

Publishing Co. Pte. Ltd., 2004.

[23] L. Duret, et al., "Strong conservation of non-coding sequences during vertebrates

evolution: potential involvement in post-transcriptional regulation of gene expression," Nucl. Acids Res., vol. 21, pp. 2315-2322, May 25, 1993 1993.

[24] L. Flanking, "Primer on Molecular Genetics," 1992.

[25] J. S. Andersen, et al., "Nucleolar proteome dynamics," NATURE, vol. 433, pp. 77- 83, 2005.

[26] J. Collinge, "PRION DISEASES OF HUMANS AND ANIMALS: Their Causes and Molecular Basis," Annual Review of Neuroscience, vol. 24, pp. 519-550, 2001. [27] E. Keedwell and A. Narayanan, Intelligent bioinformatics: the application of

artificial intelligence techniques to bioinformatics problems: John Wiley & Sons

Inc, 2005.

[28] P. d. Oliveira. (2008, 29-08-2010). Manual de genética. Available: http://home.dbio.uevora.pt/~oliveira/Bio/Manual/i4.htm

[29] R. Belshaw, et al., "Long-term reinfection of the human genome by endogenous

retroviruses," Proceedings of the National Academy of Sciences of the United

States of America, vol. 101, p. 4894, 2004.

[30] T. Ogawa and T. Okazaki, "Discontinuous DNA Replication," Annual Review of

Biochemistry, vol. 49, pp. 421-457, 1980.

[31] J. Finsterer, "Bulbar and spinal muscular atrophy (Kennedy’s disease): a review,"

European Journal of Neurology, vol. 16, pp. 556-561, 2009.

[32] T. A. Kunkel and D. A. Erie, "DNA Mismatch Repair," Annual Review of

Biochemistry, vol. 74, pp. 681-710, 2005.

[33] E. Hoffman, "Skipping toward personalized molecular medicine," The New

England journal of medicine, vol. 357, p. 2719, 2007.

[34] R. Roeder, "The role of general initiation factors in transcription by RNA polymerase II," Trends in biochemical sciences, vol. 21, pp. 327-334, 1996.

[35] P. Cramer, et al., "Structural basis of transcription: RNA polymerase II at 2.8

angstrom resolution," Science, vol. 292, p. 1863, 2001.

[36] M. Kimura, "Evolutionary rate at the molecular level," NATURE, vol. 217, pp. 624- 626, 1968.

[37] M. Arnold, Evolution through genetic exchange: Oxford University Press, USA,

2006.

[38] X. Xia, "How optimized is the translational machinery in Escherichia coli, Salmonella typhimurium and Saccharomyces cerevisiae?," Genetics, vol. 149, pp. 37 - 44, 1998.

155

[39] H. Dong, et al., "Co-variation of tRNA abundance and codon usage in Escherichia

coli at different growth rates," J Mol Biol, vol. 260, pp. 649 - 663, 1996.

[40] S. Boycheva, et al., "Codon pairs in the genome of Escherichia coli,"

bioinformatics, vol. 19, pp. 987 - 998, 2003.

[41] G. Moura, et al., "Comparative context analysis of codon pairs on an ORFeome

scale," Genome Biol, vol. 6, p. R28, 2005.

[42] S. Kanaya, et al., "Codon usage and tRNA genes in eukaryotes: correlation of

codon usage diversity with translation efficiency and with CG-dinucleotide usage as assessed by multivariate analysis," J Mol Evol, vol. 53, pp. 290 - 298, 2001. [43] D. Wilson and K. Nierhaus, "The E-site story: the importance of maintaining two

tRNAs on the ribosome during protein synthesis," Cell Mol Life Sci, vol. 63, pp. 2725 - 2737, 2006.

[44] F. Wettstein and H. Noll, "Binding of transfer ribonucleic acid to ribosomes engaged in protein synthesis: number and properties of ribosomal binding sites," J

Mol Biol, vol. 11, pp. 35 - 53, 1965.

[45] K. Nierhaus, "Decoding errors and the involvement of the E-site," Biochimie, vol. 88, pp. 1013 - 1019, 2006.

[46] K. Nierhaus, "The allosteric three-site model for the ribosomal elongation cycle: features and future," Biochemistry, vol. 29, pp. 4997 - 5008, 1990.

[47] A. Korostelev, et al., "Crystal structure of a 70S ribosome-tRNA complex reveals

functional interactions and rearrangements," Cell, vol. 126, pp. 1065 - 1077, 2006.

[48] A. Shah, et al., "Computational identification of putative programmed translational

frameshift sites," bioinformatics, vol. 18, pp. 1046 - 1053, 2002.

[49] C. Bertrand, et al., "Influence of the stacking potential of the base 3' of tandem shift codons on -1 ribosomal frameshifting used for gene expression," RNA, vol. 8, pp. 16 - 28, 2002.

[50] J. George Chin and C. S. Lansing, "Capturing and supporting contexts for scientific data sharing via the biological sciences collaboratory," presented at the Proceedings of the 2004 ACM conference on Computer supported cooperative work, Chicago, Illinois, USA, 2004.

[51] M. Crochemore, et al., Algorithms on strings: Cambridge Univ Pr, 2007.

[52] A. Srikantha, et al., "A fast algorithm for exact sequence search in biological

sequences using polyphase decomposition," bioinformatics, vol. 26, pp. i414-i419, September 15, 2010 2010.

[53] S. Offner, "Using the NCBI Genome Databases to Compare the Genes for Human & Chimpanzee Beta Hemoglobin," The american biology Teacher, vol. 72, pp. 252-256, 2010.

[54] D. L. Wheeler, et al., "Database resources of the National Center for Biotechnology

Information," Nucl. Acids Res., p. gkl1031, December 14, 2006 2006.

[55] NCBI. (2010, 23-08-2010). National Center for Biotechnology Information. Available: http://www.ncbi.nlm.nih.gov/

[56] 05-09-2010). The Broad Institute of MIT and Harvard. Available: http://www.broadinstitute.org/

[57] B. J. Haas, et al., "Genome sequence and analysis of the Irish potato famine pathogen Phytophthora infestans," NATURE, vol. 461, pp. 393-398, 2009.

[58] G. R. Cochrane and M. Y. Galperin, "The 2010 Nucleic Acids Research Database Issue and online Database Collection: a community of data resources," Nucl. Acids

156

[59] D. A. Benson, et al., "GenBank," Nucl. Acids Res., vol. 37, pp. D26-31, January 1, 2009 2009.

[60] A. Hamosh, et al., "Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders," Nucleic Acids Research, vol. 33, p. D514, 2005.

[61] OMIM. (2009, OMIM, Online Mendelian Inheritance in Man. Available: http://www.ncbi.nlm.nih.gov/omim/

[62] KEGG. (2010, 20-03-2010). KEGG: Kyoto Encyclopedia of Genes and Genomes. Available: http://www.kegg.com

[63] M. Kanehisa, et al., "KEGG for representation and analysis of molecular networks

involving diseases and drugs," Nucl. Acids Res., vol. 38, pp. D355-360, January 1, 2010 2010.

[64] T. Hubbard, et al., "The Ensembl genome database project," Nucl. Acids Res., vol.

30, pp. 38-41, January 1, 2002 2002.

[65] L. Benuskova and R. Scurr. (2010, 19-10-2010). Global Alignment. Available: http://www.cs.otago.ac.nz/cosc348/alignments/Lecture05_GlobalAlignment.pdf

[66] V. Bafna, et al., "Approximation algorithms for multiple sequence alignment,"

Theoretical Computer Science, vol. 182, pp. 233-244, 1997.

[67] S. Needleman and C. Wunsch, "A general method applicable to the search for similarities in the amino acid sequence of two proteins," Journal of molecular

biology, vol. 48, pp. 443-453, 1970.

[68] D. S. Hirschberg, "Serial computations of Levenshtein distances," in Pattern

matching algorithms, ed: Oxford University Press, 1997, pp. 123-141.

[69] T. Smith and M. Waterman, "Identification of common molecular subsequences,"

J. Mol. Bwl, vol. 147, pp. 195-197, 1981.

[70] S. Altschul, et al., "Basic local alignment search tool," Journal of molecular

biology, vol. 215, pp. 403-410, 1990.

[71] J. Stoye, "Multiple sequence alignment with the divide-and-conquer method,"

Gene, vol. 211, pp. GC45-GC56, 1998.

[72] D. Powell, et al., "A versatile divide and conquer technique for optimal string

alignment," Information Processing Letters, vol. 70, pp. 127-139, 1999.

[73] E. Ukkonen, "Algorithms for approximate string matching," Information and

control, vol. 64, pp. 100-118, 1985.

[74] D. Sokol, et al., "Tandem repeats over the edit distance," bioinformatics, vol. 23, pp. e30-e35, January 15, 2007 2007.

[75] B. Ma, et al., "PatternHunter: faster and more sensitive homology search,"

bioinformatics, vol. 18, pp. 440-445, March 1, 2002 2002.

[76] M. Li, et al., "PatternHunter II: Highly sensitive and fast homology search,"

GENOME INFORMATICS SERIES, pp. 164-175, 2003.

[77] W. J. Kent, "BLAT—The BLAST-Like Alignment Tool," Genome Research, vol. 12, pp. 656-664, April 1, 2002 2002.

[78] D. Higgins and P. Sharp, "CLUSTAL: a package for performing multiple sequence alignment on a microcomputer," GENE, vol. 73, pp. 237-244, 1988.

[79] M. A. Larkin, et al., "Clustal W and Clustal X version 2.0," bioinformatics, vol. 23, pp. 2947-8, Nov 1 2007.

[80] A. Budd. (2009, 11-09-2010). Multiple Sequence Alignments - Exercices and

157

http://www.embl.de/~seqanal/courses/commonCourseContent/commonMsaExercis es.html

[81] A. L. Delcher, et al., "Alignment of whole genomes," Nucl. Acids Res., vol. 27, pp. 2369-2376, January 1, 1999 1999.

[82] A. L. Delcher, et al., "Fast algorithms for large-scale genome alignment and

comparison," Nucleic Acids Research, vol. 30, pp. 2478-2483, June 1, 2002 2002.

[83] S. Kurtz, et al., "Versatile and open software for comparing large genomes,"

Genome Biology, vol. 5, p. R12, 2004.

[84] D. Russell, et al., "Grammar-based distance in progressive multiple sequence

alignment," BMC Bioinformatics, vol. 9, p. 306, 2008.

[85] D. Lipman and W. Pearson, "Rapid and sensitive protein similarity searches,"

Science, vol. 227, p. 1435, 1985.

[86] X. Huang, et al., "A space-efficient algorithm for local similarities," Computer

applications in the biosciences : CABIOS, vol. 6, pp. 373-381, October 1, 1990

1990.

[87] R. Edgar, "MUSCLE: Multiple sequence alignment with high score accuracy and high throughput," Nucleic Acids Res, vol. 32, pp. 1792 - 1797, 2004.

[88] T. Treangen, et al., "A novel heuristic for local multiple alignment of interspersed

DNA repeats," IEEE/ACM Transactions on Computational Biology and

Bioinformatics (TCBB), vol. 6, pp. 180-189, 2009.

[89] C. Notredame, et al., "T-Coffee: a novel algorithm for multiple sequence

alignment," J Mol Biol, vol. 302, pp. 205 - 217, 2000.

[90] K. Katoh, et al., "MAFFT: a novel method for rapid multiple sequence alignment

based on fast Fourier transform," Nucleic Acids Res, vol. 30, pp. 3059 - 3066, 2002. [91] T. Lassmann and E. Sonnhammer, "Kalign - an accurate and fast multiple sequence

alignment algorithm," BMC Bioinformatics, vol. 6, p. 298, 2005.

[92] C. Do, et al., "ProbCons: probabilistic consistency-based multiple sequence

alignment," Genome Research, vol. 15, p. 330, 2005.

[93] L. Parida, et al., "MUSCA: an algorithm for constrained alignment of multiple data

sequences," GENOME INFORMATICS SERIES, pp. 112-119, 1998.

[94] A. Subramanian, et al., "DIALIGN-TX: greedy and progressive approaches for

segment-based multiple sequence alignment," Algorithms for Molecular Biology, vol. 3, p. 6, 2008.

[95] J. S. Papadopoulos and R. Agarwala, "COBALT: constraint-based alignment tool for multiple protein sequences," bioinformatics, vol. 23, pp. 1073-1079, May 1, 2007 2007.

[96] W. R. Pearson. (2010, 20-10-2010). FASTA Tools. Available: http://www.ebi.ac.uk/Tools/sss/fasta/help/index-protein.html#program

[97] R. Kolpakov and G. Kucherov, "Finding Approximate Repetitions under Hamming Distance," in Algorithms — ESA 2001. vol. 2161, F. auf der Heide, Ed., ed: Springer Berlin / Heidelberg, 2001, pp. 170-181.

[98] S. Henikoff and J. Henikoff, "Amino acid substitution matrices from protein blocks," Proceedings of the National Academy of Sciences of the United States of

America, vol. 89, p. 10915, 1992.

[99] A. Elofsson. (2000, 29-09-2010). Scoring Matrices and gap penalties. Available: http://bioinfo.se/kurser/swell/substmatrix.html

[100] (2007, 30-09-2010). Scoring Matrix. Available:

158

[101] M. Dayhoff and R. Schwartz, "A model of evolutionary change in proteins," 1978. [102] G. Gonnet, et al., "Exhaustive matching of the entire protein sequence database,"

Science, vol. 256, p. 1443, 1992.

[103] NCBI. (2010, 20-10-2010). The Statistics of Sequence Similarity Scores. Available: http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html

[104] E. Rocha, "Folhas de BioInformática e Análise de sequências," Centre national de la recherche scientifique2001.

[105] S. Henikoff and J. G. Henikoff, "Performance evaluation of amino acid substitution matrices," Proteins, vol. 17, pp. 49-61, Sep 1993.

[106] W. R. Pearson, "Comparison of methods for searching protein sequence databases,"

Protein Sci, vol. 4, pp. 1145-60, Jun 1995.

[107] R. Mott, Smith–Waterman Algorithm: John Wiley & Sons, Ltd, 2001.

[108] C. Bio, "Bioinformatics explained: Smith-Waterman," 2007.

[109] G. Barton, "Protein Sequence Alignment and Database Scanning," in Protein

Structure prediction - a practical approach, M. J. E. Sternberg, Ed., ed: Oxford

University Press, 1996.

[110] S. F. Altschul, et al., "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs," Nucleic Acids Res, vol. 25, pp. 3389-402, Sep 1 1997.

[111] MPI. 10-04-2010). mpiBLAST. Available: http://www.mpiblast.org

[112] NVidia. 10-04-2010). BLASTp on Tesla. Available:

http://www.nvidia.com/object/blastp_on_tesla.html

[113] Mitrion. 10-04-2010). OpenBio. Available: http://mitc-openbio.sourceforge.net/

[114] A. Biocomputing. 14-04-2010). AB-BLAST. Available:

http://www.advbiocomp.com/blast.html

[115] TimeLogic. 14-04-2010). Tera BLAST. Available:

http://www.timelogic.com/decypher_blast.html

[116] TimeLogic. 15-09-2010). TimeLogic Biocomputing Solutions. Available: http://www.timelogic.com/decypher_citations.html

[117] K. Thompson, "Programming Techniques: Regular expression search algorithm,"

Commun. ACM, vol. 11, pp. 419-422, 1968.

[118] J. Morris and V. Pratt, "A linear pattern-matching algorithm," Technical Report 40, University of California, Berkeley, 19701970.

[119] D. Gusfield, Algorithms on strings, trees and sequences: computer science and

computational biology: Oxford University, 1999.

[120] D. E. Knuth, et al., "Fast Pattern Matching in Strings," SIAM Journal on

Computing, vol. 6, pp. 323-350, 1977.

[121] I. Simon, "String matching algorithms and automata," in Results and Trends in

Theoretical Computer Science. vol. 812, J. Karhumäki, et al., Eds., ed: Springer

Berlin / Heidelberg, 1994, pp. 386-395.

[122] A. Apostolico and Z. Galil, Pattern matching algorithms. New York: Oxford University Press, 1997.

[123] P. Weiner, "Linear pattern matching algorithms," 1973, pp. 1-11.

[124] E. McCreight, "A space-economical suffix tree construction algorithm," Journal of

the ACM (JACM), vol. 23, pp. 262-272, 1976.

[125] E. Ukkonen, "On-line construction of suffix trees," Algorithmica, vol. 14, pp. 249- 260, 1995.

159

[126] U. Manber and G. Myers, "Suffix arrays: a new method for on-line string searches," 1990, pp. 319-327.

[127] A. Aho and M. Corasick, "Efficient string matching: an aid to bibliographic search," Communications of the ACM, vol. 18, pp. 333-340, 1975.

[128] R. S. Boyer and J. S. Moore, "A fast string searching algorithm," Commun. ACM, vol. 20, pp. 762-772, 1977.

[129] C. Maxime and R. Wojciech, Text algorithms: Oxford University Press, Inc., 1994. [130] R. Karp and M. Rabin, "Efficient randomized pattern-matching algorithms," IBM

Journal of Research and Development, vol. 31, p. 249, 1987.

[131] X. Wang, et al., "Collisions for hash functions MD4, MD5, HAVAL-128 and RIPEMD."

[132] C. Charras and T. Lecroq, Handbook of exact string matching algorithms: Citeseer, 2004.

[133] E. Lander, et al., "Initial sequencing and analysis of the human genome," NATURE,

vol. 409, pp. 860-921, 2001.

[134] G. Moura, et al., "Large scale comparative codon-pair context analysis unveils general rules that fine-tune evolution of mRNA primary structure," PLoS One, vol. 2, p. e847, 2007.

[135] M. Santos and M. Tuite, "The CUG codon is decoded in vivo as serine and not leucine in Candida albicans," Nucleic Acids Res, vol. 23, pp. 1481 - 1486, 1995. [136] C. Marck, et al., "The RNA polymerase III-dependent family of genes in

hemiascomycetes: comparative RNomics, decoding strategies, transcription and evolutionary implications," Nucleic Acids Research, vol. 34, p. 1816, 2006.

[137] S. K. Shin and G. L. Sanders, "Denormalization strategies for data retrieval from data warehouses," Decision Support Systems, vol. 42, pp. 267-282, 2006.

[138] B. Louie, et al., "Data integration and genomic medicine," Journal of Biomedical

Informatics, vol. 40, pp. 5-16, 2007.

[139] J. K. Han, Micheline Data Mining – Concepts and Techniques, second edition ed.: Morgan Kaufmann Publishers, 2006.

[140] R. M. Wideman, "Software Development and Linearity (Or, why some project management methodologies don’t work)," Projects & Profits, 2003.

[141] S. Tripp and B. Bichelmeyer, "Rapid prototyping: An alternative instructional design strategy," Educational Technology Research and Development, vol. 38, pp. 31-44, 1990.

[142] G. R. Moura, et al., "Codon-triplet context unveils unique features of the Candida albicans protein coding genome," BMC Genomics, vol. 8, p. 444, 2007.

[143] J. P. Lousado, et al., "Exploiting Codon-Triplets Association for Genome Primary Structure Analysis," presented at the Biocomputation, Bioinformatics, and Biomedical Technologies, 2008. BIOTECHNO '08. International Conference on, Bucharest, Romania, 2008.

[144] J. P. Lousado, et al., "GeneSplit - Uma Aplicação para o Estudo de Associações de Codões e de Aminoácidos em ORFeomas," in CISTI 2008: 3ª Conferencia Ibérica

de Sistemas y Tecnologías de la Información, OURENSE, 2008.

[145] R. A. George, et al., "Analysis of protein sequence and interaction data for candidate disease gene prediction," Nucl. Acids Res., vol. 34, p. e130, November 14, 2006 2006.

160

[146] S. Ali, et al., "Analysis of the evolutionarily conserved repeat motifs in the genome

of the highly endangered central Indian swamp deer Cervus duvauceli branderi,"

GENE, vol. 223, pp. 361–367, 1998.

[147] Z. Fu and T. Jiang, "Clustering of main orthologs for multiple genomes.," J

Bioinform Comput Biol, vol. 6, pp. 573-84, Jun 2008.

[148] N. C. Jones and P. A. Pevzner, "Comparative genomics reveals unusually long motifs in mammalian genomes," Bioinformatics, vol. 22, pp. e236-242, July 15, 2006 2006.

[149] M. Brameier and C. Wiuf, "Ab initio identification of human microRNAs based on structure motifs," BMC Bioinformatics, vol. 8, p. 478, 2007.

[150] T. a. Bowen, et al., "Repeat sizes at CAG/CTG loci CTG18.1, ERDA1 and TGC13-7a in schizophrenia," Psychiatric Genetics, vol. 10, pp. 33-37, 2000.

[151] T. V. Pestova, et al., "A conserved AUG triplet in the 5' nontranslated region of poliovirus can function as an initiation codon in vitro and in vivo," Virology, vol. 204, pp. 729-37, Nov 1 1994.

[152] Y. O. Herishanu, et al., "Huntington disease in subjects from an Israeli Karaite community carrying alleles of intermediate and expanded CAG repeats in the HTT gene: Huntington disease or phenocopy?," Journal of the Neurological Sciences, vol. 277, pp. 143-146, 2009.

[153] V. Bogaerts, et al., "Genetic findings in Parkinson's disease and translation into treatment: a leading role for mitochondria?," Genes Brain Behav, vol. 7, pp. 129- 51, Mar 2008.

[154] M. A. Mena, et al., "On the pathogenesis and neuroprotective treatment of Parkinson disease: what have we learned from the genetic forms of this disease?,"

Curr Med Chem, vol. 15, pp. 2305-20, 2008.

[155] B. A. Tarini, et al., "Parents Interest in Predictive Genetic Testing for Their Children When a Disease Has No Treatment," Pediatrics, vol. 124, pp. e432-e438, Aug 24 2009.

[156] W. Hsueh, "Genetic discoveries as the basis of personalized therapy: rosiglitazone treatment of Alzheimer's disease," Pharmacogenomics J, vol. 6, pp. 222-4, Jul-Aug 2006.

[157] M. P. Gabriela Moura, Raquel Silva, Isabel Miranda, Vera Afreixo, Gaspar Dias, Adelaide Freitas, José L Oliveira, and Manuel AS Santos, "Comparative context analysis of codon pairs on an ORFeome scale," Genome Biology, vol. 6, 2005. [158] D. B. Gordon, et al., "TAMO: a flexible, object-oriented framework for analyzing

transcriptional regulation using DNA-sequence motifs," Bioinformatics, vol. 21, pp.