HOMOLOGY MODELLING AND SEQUENCE ANALYSIS OF anxC3.1

(1)

HOMOLOGY MODELLING

AND SEQUENCE ANALYSIS

OF anxC3.1

Patchikolla Sateesh * (1), Prof. Allam Appa Rao(2), Suresh Kumar Sangeeta (3), M.Naresh Babu(4), R.S. Datta Teja Grandhi(5).

1. Associate Professor, Dept. of Computer Science, M.V.G.R College of Engg., Vizianagaram, A.P., India. 09246615251,0891-6644143

2. Vice Chancellor, Jawaharlal Nehru Technological University :Kakinada, Kakinada 533 003, AP, India.

3. Suresh Kumar sangeeta, Msc bioinformatics, GITAM University, vishakapatnam

4. Lecturer ,CSE Department,JNTU Kakinada,Kakinada 533 003,AP,India 5. Research Scholar. Vizag , A.P, India, ABSTRACT:

During the last two decades, the number of sequence known proteins has increased rapidly. In contrast, the corresponding increment of structure known protein is much slower. The unbalanced situation has critically limited our ability to understand the molecular mechanism of protein and conduct structure based drug design timely by using updated information of newly found sequences. Therefore it is highly desired to model 3D structure of protein by using structural bioinformatics approach by homology modeling. In this study homology modelling approach was utilized to develop 3D structure of Aspergillus fumigatusAf293(anxc3.1).

An annexin, (anxC3.1), was isolated and characterized from the industrially important filamentous fungus Aspergillus niger. anxC3.1 is a single copy gene encoding a 506 amino acid predicted protein which contains four annexin repeats. AnxC3.1 expression was found to be unaltered under a variety of conditions such as increased secretion, altered nitrogen source, heat shock, and decreased Ca2+ levels, indicating that anxC3.1 is constitutively expressed. This is the first reported functional characterization of a fungal annexin. So it highly desired to develop model to this protein. Pair wise sequence alignment program like BLAST was employed to study the influence of matrix on sequence alignment. Blosum 62 matrix by BLAST program revealed a clear evolutionary relationship, and this analysis displayed( 30%) identity with the PDB protein 1M9IA. Homology derived, model is generated using geno3d and the model with lowest energy( -14401.90 kcal/mol), RMSD(0.9843A) was considered as the best optimized and superimposed model, and the structure was validated by using prochek analysis.

KEYWORDS:

BLAST(Basic Local Alignment Search Tool), PDB(Protein Data Bank), SPDBV(Swiss Protein Data Bank Viewer), PIR(Protein Information Resource).

INTRODUCTION:

(2)

Patchikolla Sateesh et. al. / International Journal of Engineering Science and Technology Vol. 2(5), 2010, 1125-1130

the model is accurate and its stereochemical quality good. The annexins are a family of calcium- and phospholipids-binding proteins that have been widely studied in animals.

Investigation of annexins in the fungus Aspergillus fumigatus identified a novel annexin-like gene (ANXC4) as well as two conventional annexins (ANXC3.1 and ANXC3.2). The conventional annexins like ANXC3.2 are hypothetical and their role was not predicted yet in Aspergillus fumigatus.Annexin Protein which binds at least one calcium atom or protein whose function is calcium-dependent. Calcium is a metal, chemical symbol Ca. Calcium is essential for a variety of bodily functions, such as neurotransmission, muscle contraction and proper heart function. The genes were initially identified by bioinformatics, and sequences were then determined experimentally.

ANXC4 lacked calcium-binding consensus sequences and had a 553 residue N-terminal tail. However, bioinformatics indicated that ANXC4 is an annexin and homologues were identified in other filamentous fungi. ANXC4 therefore represents a new grouping within the annexin family.

METHODS:

CHOICE OF TEMPLATE

Homologs to the target enzymes sequences were identified FASTA[Altschul et al 1997]with for the PDB databank on NCBI. The chosen templates were the sequence from the latest version of the PDB databank with the lowest expected value and highest score after four iterations.

SEQUENCE ALIGNMENT

Alignment of sequences with their templates structure was done using the alignment.malign() command in SPDBV Tool. The software also takes into account structural information from the template when constructing an alignment.

The SPDBV was used for aligning all target sequences in the .ail file with their corresponding template structures in the PDB files. Finally, the alignment was written out in two formats, PIR. The PIR format is used by SPDBV in the subsequent model building stage. The details of modeling and sequence alignment scripts are submitted as supplementary material.

BLAST:

The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.

(3)

HOMOLOGY MODELLING:

A 3D model of the target sequence was constructed with the automodel class of SPDBV to generate five similar iterative models of the target sequence based on its template structure and the alignment input file 'filename.ali' (PIR format). The 'best' model was selected by picking the model with the lowest SPDBV objective function value, which is reported in the second line of the model PDB file. Quality of homology model The quality of the structures were analyzed with the PROCHECK [Laskowski et al 1993] program to calculate the main-chain torsional angle, i.e., a Ramachandran plot [Ramachandran et al 1968].

Several alignments were tried and several three-dimensional models were constructed by using the MODELLER module within Insight II. The final model was also opportunely minimized using the Discover3 Module. The resulting models were verified using the on-line software WHATH IF and ERRAT at the UCLA-DOE Structure Analysis and Verification Server (http://www.doe-mbi.ucla.edu/Services/SV/). Most of the analysed parameters had statistical values that were in the range of those expected for a naturally folded protein.

Actually we use a Data Mining algorithm for clustering. We can predict that if two proteins has same similar sequence then those two protein come to same group for this grouping we used SVM(Support Vector Machine) Data Mining Technique. we have written a Brute Force code in C++ Language for selecting a best template, which template sequence gives the maximum no.of matches with the unknown protein sequences. By Using SPDBV(Swiss Data Bank Viewer) Tool we predict the unknown protein structure. By Ramchandran Plot we validate the Structure if it is >90% the structure is valid, otherwise again we have to model the structure.

GENO3d:

Geno3D server is to make accessible to all biochemists and biologists an automated protein modelling Web server to generate protein 3D model.

Geno 3d generates Information about whole molecule, ie information about energy, Ramachandran diagram (computing using PROCHECK software by Roman A. Laskowski, deviation between models, number of restraints deduce from template.

(4)

MSYNQYPPPNQPPYPQYGAPQGFSQPPYPQQSGYGGFPLPQGHYNRPPPPAPSPGGYPPA PGPGYPHSPPPQPPYGAPSQHPYPPQGGPGYPPPAGGYPQPGPYGAPPVGGSGYPTPPPQ QFQGPPAMPSLGYVPGQMAPGDFRREADLLRKAMKGFGTDEKMLIQVLSKLDPLQMAAVR STYTNHHHRDLYKDVKSETSSYFRQGLLAIIDGPLLHDVQSLREAVQGLGTKEWLLNDVV LGRSNADLNAIKAAYEHTFHRSLQKDVEADLSFKTRSLFSLVLRAERHEPSYPINPQLIE QEARAIHAATSGRVVNNVDEVCGIFARASDPELRAISQAFGARYNSSLESHIEKEFSGHM KDALLHMLRTALDPAMRDADLLEDCMKGMGTKDEKLVTRVVRLHWNRQHLDQVKRAYHHR YKRDLIARVRGETSGDYQKLMVALLE

The alignment tool ncbi BLAST considered for sequence analysis to know which matrix better predicts the alignment between query and subject sequences. They are carried out as given below.

BLAST Analysis (blastp program):

BLAST analysis was carried out using five matrices PAM30,70and BL- 80,62, 45 and the results were shown below.

Matrix %Identity %Positives No.of Gaps(%)

Score(Bits) E-value PDB ID

PAM 30 31 49 4 317 5e-32 1M9I

PAM 70 30 49 4 315 4e-83 1M9I

BL 80 30 48 6 339 2e-36 1M9I

BL 62 29 48 6 313 4e-35 1M9I

BL 45 31 52 0 313 4e-36 1M9I

TABLE 1: Blast analysis using five different matrices, score, evalue , %identities,

From the above analysis PAM 30 was selected as a scoring matrix since it has high percentage of identity(31%) and score(317)low evalue(5e-32) when compared with the other matrices(pam70,BL80,BL 62, BL 45).

MODEL NO OPTIMIZED ENERGY((Kcal/mol) RMSD VALUES(Angstrom)

MODEL 1 -14248.90 0.9744

MODEL 2 -14401.90 0.9852

MODEL 3 -14292.30 0.9945

TABLE 2: Blast analysis using five different matrices, score, evalue , %identities, %similarities, gaps and overlap

MODELS WITH OPTIMIZED ENERGIES AND RMSD VALUES:

In this study second model with lowest energy(-14401.90Kcal/mol) and RMSD values(0.9852 A) was considered as the best optimized and super imposed model among the three models. futher analysis was supported by

Ramachandran plots. Overall Structure:

(5)

Fig 1: Final structure

Ramachandran plot:

Fig 2: Ramachandran plot of the modeled structure

PROCHECK REPORT:

Validation for the generated models has been carried out by procheck. This report results in the construction of RC plot for generalized model and modeled protein anxC 3.1.

(6)

CONCLUSION:

In order to use any sequence alignment tool with different scoring matrices, one must have to quantify scoring matrices that may likely to conserve the physical and chemical properties necessary to maintain the structure and function of the protein. Pair wise sequence analysis BLAST employed to study the influence of matrix on clear evolutionary relationship. The analysis performed with chick protein sequence database has identified relevant homology with PDB protein 1M9I.

Our results suggest that although homologous sequences resulted from different scoring matrices, using the three pair wise sequence alignment tools selection of reasonable matrix for better alignment plays a major role in identifying an organized alignment pattern for predicting structure and functional relationships. Homology modeling initiated with SPDBV run in windows operation system resulted in no of models. The data consistent with model as it reported low RMSD (0.9852A),so it is best model.

The Ramachandran plots identified the probable number of residues in most favoured region were increased from 93.5% in template structure 1ML6 to 94.4% in the modeled protein Q08392 and the number of residues in the disallowed regions was zero in both modeled and template proteins.. This study suggest the fact that a fast and reliable homology model was possible by considering the sequences with profound similarity at sequence level as the method employed is customizable and result oriented.

REFERENCES:

[1] Sheehan D, Meade G, Foley VM, Dowd CA(2001) Structure, function and evolution of glutathione transferases: implications for

classification of non-mammalian members of an ancient enzyme superfamily. Biochem J. Nov 15; [2] FOR BOOK REFERNCE

DataMining – Concepts and Techniques – JIAWEL HAN & MICHELINE KAMBER Harcourt India.

[3] Hayes JD, Pulford DJ (1995) The glutathione S-transferase supergene family: regulation of GST and the contribution of the

isoenzymes to cancer chemoprotection and drug resistance 30(6):445-600 , ,Crit Rev Biochem Mol Biol. 24-23 [4] FOR BOOK REFERENCE

Introduction to Bioinformatics. By, Teresa K. Attwood, David Parry-Smith. Edition, 1st edition, May 2001. Format, Paperback textbook, 240pp ...

[5] Kidd LC, Woodson K, Taylor PR, Albanes D, Virtamo J, Tangrea JA (2003) Polymorphisms in glutathione-S-transferase genes

(GST-M1, GST-T1 and GST-P1) and susceptibility to prostate cancer among male smokers of the ATBC cancer prevention study Eur J Cancer Prev.,21-22

[6] Jiao L, Bondy ML, Hassan MM, Chang DZ, Abbruzzese JL, Evans DB, Smolensky MH, Li D ( 2007): Glutathione S-transferase gene

polymorphisms and risk and survival of pancreatic cancer, Cancer. No: 8-9

[7] Gehlhaar, D. K.; Verkhivker, G.; Rejto, P. A.; Fogel, D. B.; Fogel, L. J.; Freer, S. T. (1995) Docking Conformationally Flexible Small

Molecules Into a Protein Binding Site Through Evolutionary Programming. Proceedings of the Fourth International Conference on Evolutionary Programming, No 123-124

[8] Laskowski, R. A., MacArthur, M. W., Moss, D. S. and Thronton, J. M. (1993) PROCHECK: a program to check the stereochemical

quality of protein structures. J. Appl.Cryst. 26, 283-291

[9] Ramachandran, G. N., Sasisekharan, V. (1968) Conformation of polypeptides and proteins. Adv. Protein Chem. 23, 283-438.

[10] Sali, A. and Blundell, T. L. (1995) Comparative protein modelling by satisfaction of spatial restraints. J. Mol. Biol. 234, 779-815.

[11] Dean Y. Maeda,a, Sumit S. Mahajan,b William M. Atkinsb and John A. Zebala (April 2006) Bivalent inhibitors of glutathione

S-transferase: The effect of spacer length on isozyme selectivity, J Bioorganic & Medicinal Chemistry, 3780–3783

[12] Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, Z., Miller, W., Lipman, D. J. (1997) Gapped BLAST and PSI-BLAST: a