Understanding proteinstructure is of crucial importance in science, medicine and biotechnology. For about two decades, knowledge-based potentials based on pairwise distances – so-called ‘‘potentials of mean force’’ (PMFs) – have been center stage in the prediction and design of proteinstructure and the simulation of protein folding. However, the validity, scope and limitations of these potentials are still vigorously debated and disputed, and the optimal choice of the reference state – a necessary component of these potentials – is an unsolved problem. PMFs are loosely justified by analogy to the reversible work theorem in statistical physics, or by a statistical argument based on a likelihood function. Both justifications are insightful but leave many questions unanswered. Here, we show for the first time that PMFs can be seen as approximations to quantities that do have a rigorous probabilistic justification: they naturally arise when probability distributions over different features of proteins need to be combined. We call these quantities ‘‘reference ratio distributions’’ deriving from the application of the ‘‘reference ratio method.’’ This new view is not only of theoretical relevance but leads to many insights that are of direct practical use: the reference state is uniquely defined and does not require external physical insights; the approach can be generalized beyond pairwise distances to arbitrary features of proteinstructure; and it becomes clear for which purposes the use of these quantities is justified. We illustrate these insights with two applications, involving the radius of gyration and hydrogen bonding. In the latter case, we also show how the reference ratio method can be iteratively applied to sculpt an energy funnel. Our results considerably increase the understanding and scope of energy functions derived from known biomolecular structures.
The correlation between sequence and secondary structure scores and fragment RMSD to the native structure was also investigated. We observed that, once homologs are excluded from template databases, sampling at random from fragments that satisfy a score cutoff produces better results than extracting fragments exhaustively. We opted to employ a combination of both methods (random sampling and exhaustive sampling) in Flib. Exhaustive extraction is useful for finding high scoring fragments that are likely to be good, whereas random methods increase the diversity of the final ensemble. We have observed that ranking fragments accord- ing to predicted torsion angles improved results. Previous results suggest that predicted torsion angles perform better than predicted secondary structure in assisting proteinstructure predic- tion . Fragments extracted from protein threading hits were also added to our fragment li- braries. These fragments improved the accuracy of generated libraries and these fragment
performed for all proteins and at all digital resolutions. Even for several proteins it would take an unreasonable amount of measurement time, as obtaining a complete experimental set for backbone and side-chain resonance assignment and NOE distance restraint collection for the structure calculation of one protein at one resolution usually takes about one to two weeks. Even with already available chemical shift assignments, the measurement of two 3D NOESY experiments can take several days. Owing to these limitations, the study is unrealistic to be performed experimentally. We thus tried to model everything as realistically as possible: experimentally obtained chemical shift values for 13 C- and 15 N-resolved NOESY peak lists were taken from BMRB database, inter-atomic distances were derived from PDB struc- tures, and back-calibrated into peak volumes. All other experi- mental and NUS sampling parameters such as spectral widths, numbers of time-domain points, and spectrometer carrier posi- tions were taken equal to those used for previously performed experiments for medium sized protein molecules [29,30]. More detailed description of the experimental parameters used in modeling the spectral resolution is given in the Materials and Methods section and in Table S1. We perform automated NOESY cross peak assignment and structure calculations using distance restraints from the modeled NOESY peak lists  and dihedral angle ranges obtained from backbone chemical shifts. Conse- quently, we provide qualitative results on how the digital resolution affects proteinstructure calculations. The effects of the digital resolution on the signal-to-noise ratio per unit of measurement time, the total number of peaks, peak overlap, and proteinstructure calculations were evaluated.
Much attention has recently been given to the statistical significance of topological features observed in biological networks. Here, we consider residue interaction graphs (RIGs) as network representations of protein structures with residues as nodes and inter-residue interactions as edges. Degree-preserving randomized models have been widely used for this purpose in biomolecular networks. However, such a single summary statistic of a network may not be detailed enough to capture the complex topological characteristics of protein structures and their network counterparts. Here, we investigate a variety of topological properties of RIGs to find a well fitting network null model for them. The RIGs are derived from a structurally diverse protein data set at various distance cut-offs and for different groups of interacting atoms. We compare the network structure of RIGs to several random graph models. We show that 3-dimensional geometric random graphs, that model spatial relationships between objects, provide the best fit to RIGs. We investigate the relationship between the strength of the fit and various protein structural features. We show that the fit depends on protein size, structural class, and thermostability, but not on quaternary structure. We apply our model to the identification of significantly over-represented structural building blocks, i.e., network motifs, in proteinstructure networks. As expected, choosing geometric graphs as a null model results in the most specific identification of motifs. Our geometric random graph model may facilitate further graph-based studies of protein conformation space and have important implications for proteinstructure comparison and prediction. The choice of a well-fitting null model is crucial for finding structural motifs that play an important role in protein folding, stability and function. To our knowledge, this is the first study that addresses the challenge of finding an optimized null model for RIGs, by comparing various RIG definitions against a series of network models.
To load the set of CSV files to the ProteinsDB, we have used the SQL Server 2005 command line utility, BCP (Bulk Copy Program), which is quite efficient. To make the loading more efficient we have also turned off the foreign key constraints on the tables involved in this process. The total loading time for a pdb file is highly dependant on the protein size, but on average, it takes about 0.3 seconds, with DSSP being responsible for the majority of the cpu time. The loading itself is mainly an I/O bound task with little cpu overhead.
From an ecological point of view, colicins are anticompetitor molecules (23). Although research on colicins has generated a wealth of information in terms of molecular genetics, mode of action and application, little is known about the natural ecology Figure 2. Schematic representation of the mechanism of action of colicins. (A) pore- forming colicins, (B) colicin that cause hydrolysis of peptidoglycan (C) nucleases. The colicins are produced and their binding to their specific immunity protein -Imm (white hat) became inactive (dark gray circle). Colicins could be released by cell lysis in their active form (white circle). In the extracellular medium this colicin form could bind to specific receptor in the sensitive cells and translocate (light gray flattened circles) through the cytoplasm and exert its specific mode of action (A), (B) or (C). However, producing cells could recognized the active form of colicin, but immediately after their translocation it would be inactivated by binding to the immunity protein. Adapted from Baba and Scheneewind (4).
Proteinstructure and function studies require large quantities of pure protein. Many proteins are normally expressed at very low amounts, precluding their isolation and purification processes. In order to produce larger quantities of a target protein, recombinant DNA technology is used. This process starts with DNA cloning by recombinant DNA methods. Molecular cloning is a set of experimental methodologies used to assemble recombinant DNA molecules and to direct their replication within a specific organism. This process consists in a DNA fragment insertion into a vector that is introduced into a host cell. The vector replicates and generates a large number of identical DNA molecules. In Escherichia coli (E. coli) organism, widely used for recombinant protein expression, plasmids are the most commonly used vectors. Plasmids are circular, double-stranded DNA molecules which naturally occur in bacteria and in lower eukaryotic cells. This DNA duplicates before every cell division, like a chromosomal DNA, and those copies are transferred to the daughter cells. Such process assures vector propagation through cell generation. Different plasmid types were engineered in order to optimize their use as vectors in DNA cloning. DNA fragment insertion into the plasmid requires two types of enzymes: restriction enzymes and DNA ligase. Restriction enzymes are endonucleases, which recognize specific sequences (restriction sites) and cleave DNA strands – DNA restriction. Restriction of specific gene of interest (GOI) allows the isolation of the gene encoding for the target protein. The GOI can be inserted into the plasmid by DNA ligase, which covalently join the complementary ends of the fragment and plasmid DNA.
quantitative analysis of the protein secondary structure for DDC, DDC-1, and DDC- are shown in Figure 6 and Table 3. In Figure 6, four fitting peaks of DDC-2 remained the same in contrast to DDC. That of DDC-1 shifted to lower wavenumbers. This revealed that the interaction between C=O and N–H in the protein main chain was strengthened after hydrolysis. The contents of a-helix, β-sheet, β-turn, and random turns in DDC-2 remained the same versus DDC (Table 3). The amount of random turns in DDC-1 decreased while other secondary structures increased. This indicated that the proteinstructure become more stable after hydrolysis in a strong acidic digestive system (Kanakis et al., 2011).
Taken together, our results suggest that the patterns of diversity within the T cell epitopes within the TSR domain of CS are in part determined by the relative location of polymorphic amino acids within the intact proteinstructure. While our data argues that inter-molecular interactions of the intact protein are likely key to the observed diversity, it does not exclude a role for the T cell responses that have been observed in exposed individuals. However, it does raise the possibility that T cell responses are not the primary driver of polymorphism. Given that the TSR domain is well conserved across species and found 187 times within the human proteome , the T cell responses observed may in large part be due to the divergent nature of the TH2 and TH3 region being recognized as non-self by the human host’s immune system. Thus, any functional impact that these regions Figure 8. Calculated DDG of observed polymorphic amino mutations from the ancestral amino acid residue compared relative to median of all possible mutations at each position. Free energy changes of polymorphisms in TH2 and TH3 are shown relative to the median change from all 19 substitutions from the predicted ancestral allele determined from Plasmodium sp. phylogeny. Mutations that have higher energy than the median are shown in red, while those with lower energy are shown in blue. Positive values represent increases in free energy and thermodynamic instability while negative values represent decline in free energy and greater stability. Neutral sequence where energetics have no effect would be expected to occur 50/50 above and below the median, while conservation of intramolecular function would be expected to minimize entropy and lead to lower energy states. Intermolecular interactions can lead to selection for less favorable states which are significantly enriched in the observed polymorphisms (17 increased vs 5 decreased, p = 0.00845).
The full length protein sequence (including the N-terminal 19 amino acid signal peptide) was used for the protein model generation because there are no data available which indicate in which form SP-G is present at its site of action. First attempts to obtain the 3D structure by homology modeling failed because there were no entries in the PDB with a sufficiently high sequence homology to SP-G. Also the threading method did not lead to satisfying results since the sequence of SP-G contains no conserved domains and the secondary structure prediction for a sequence with only 78 amino acids is very complicated. Therefore, the sequence was submitted to the online server Robetta. It applies ab initio folding to obtain a structural model in a very time consuming process. But for the short SP-G sequence, results were expected in reasonable time. Indeed, the obtained model showed a very promising quality and needed only minor optimizations. After energy minimization and MD refinement in YASARA, the PROCHECK evaluation shows a very good stereochemical quality of the protein model. From the 78 amino acids, 95.5% are in the most favored regions, the remaining 4.5% show dihedral angle values in the additional allowed regions of the Ramachan- dran plot. The evaluation with PROSA shows a very good model quality as well. The plot of the combined pair and surface potential (Figure 2) is clearly negative for all regions of the protein and the combined Z-score of -6.16 is close to the average value (27.77) for proteins of this length. These validation results indicate a good native-like fold of the proteinstructure model.
The space of possible protein structures appears vast and continuous, and the relationship between primary, secondary and tertiary structure levels is complex. Proteinstructure comparison and classification is therefore a difficult but important task since structure is a determinant for molecular interaction and function. We introduce a novel mathematical abstraction based on geometric topology to describe protein domain structure. Using the locations of the backbone atoms and the hydrogen bonds, we build a combinatorial object – a so-called fatgraph. The description is discrete yet gives rise to a 2- dimensional mathematical surface. Thus, each protein domain corresponds to a particular mathematical surface with characteristic topological invariants, such as the genus (number of holes) and the number of boundary components. Both invariants are global fatgraph features reflecting the interconnectivity of the domain by hydrogen bonds. We introduce the notion of robust variables, that is variables that are robust towards minor changes in the structure/fatgraph, and show that the genus and the number of boundary components are robust. Further, we invesigate the distribution of different fatgraph variables and show how only four variables are capable of distinguishing different folds. We use local (secondary) and global (tertiary) fatgraph features to describe domain structures and illustrate that they are useful for classification of domains in CATH. In addition, we combine our method with two other methods thereby using primary, secondary, and tertiary structure information, and show that we can identify a large percentage of new and unclassified structures in CATH.
Introduction: The advances in thyroid molecular biology studies provide not only insight into thyroid diseases but accurate diagnosis of thyroid cancer. Objective: Design a tutorial on protein molecular modeling of genetic markers for thyroid cancer. Methods: The proteins were selected using the Protein Data Bank sequence and the basic local alignment search tool (BLAST) algorithm. The obtained sequences were aligned with the Clustal W multiple alignment algorithms. For the molecular modeling, three-dimensional structures were generated from this set of constraints with the SWISS-MODEL, which is a fully automated proteinstructure homology-modeling server, accessible via the ExPASy web server. Results: We demonstrated protein analysis, projection of the molecular structure and protein homology of the following molecular markers of thyroid cancer: receptor tyrosine kinase (RET) proto-oncogene; neurotrophic tyrosine kinase receptor 1 (NTRK1) proto-oncogene; phosphatase and tensin homolog (PTEN); tumor protein p53 (TP53) gene; phosphoinositide 3-kinase/threonine protein kinase (PI3K/AKT); catenin beta 1 (CTNNB1); paired box 8-peroxisome proliferator-activated receptor gamma (PAX8-PPARG); rat sarcoma viral oncogene (RAS); B-raf proto-oncogene, serine/threonine kinase (BRAF); and thyroid-stimulating hormone receptor (TSHR). Conclusion: This study shows the importance of understanding the molecular structure of the markers for thyroid cancer through bioinformatics, and consequently, the development of more effective new molecules as alternative tools for thyroid cancer treatment.
In this study, we adopted the interface definition based on the distance between any atoms across the interface. To find the optimal distance, we generated five interface libraries with different values of the distance: 6 A ˚ , 8 A˚, 10 A˚, 12 A˚ and 16 A˚ (see Methods). Figure 1 shows an example of interface fragments in 1bp3 complex corresponding to different cutoff distances. One can clearly see the gradual appearance of the secondary structure elements as the cutoff value increases. The interface of the first protein in the complex (blue ribbons in Figure 1) largely consists of two a-helixes (residues G161–S184 and H18–Y28) interacting with b-sheet (b-strands W272-V279 and D291–V297) and loop fragments (residues Y240–M248, K385-W391, L202–I209 and P329–E366) from the second protein (red ribbons in Figure 1). However, the fragment from the 6 A ˚ library (Figure 1A) contains only a short fragment (residues D171–I179) of one of the a-helixes and the b-sheet structure of the second component is indiscernible with only short fragments (S270-T274 and E292-Y294) visible. Such representation is clearly inadequate for the successful structural alignment that involves secondary structure elements. The fragment from the 8 A ˚ library (Figure 1B) has longer a-helix (D171-R183) in the first protein and visible b-sheet-like structure in the second component, but the second a-helix of the first protein still remains obscure. The fragment from the 10 A ˚ library (Figure 1C) already shows one full a-helix in the first protein and the complete b-sheet structure in the second protein. Yet, the second a-helix from the first protein (residues Q22-D26) is only partially visible. Only the fragment from the 12 A ˚ library reveals the complete structural details of the interface (Figure 1D). Further increase of the distance leads to inclusion of significant non- interface parts of proteinstructure (the effect already seen in
Validation of experimental interactions using the triangle rate score. We can also consider the converse, using the triangle rate score to validate a stated interaction, with the aim to identify potentially false positives. We examined our lowest scoring 5% (4,355 protein pairs); 49 of which are found in DIP Yeast. Among these 49 pairs, 42 do not share the same function. There are 11 pairs that share neither function nor subcellular location. One example is the interaction between ‘‘Protein TEM1’’ (TEM1) and ‘‘Long-chain-fatty-acid–CoA ligase 4’’ (FAA4). The database entry is based on Yeast two-hybrid experiments, a particularly error-prone experimental technique. While TEM1 is located in cytoskeleton, endoplasmic reticulum, or punctate composite, FAA4 is in cytoplasm. In terms of functional categories, TEM1 involves in nucleotide binding and in hydrolase activity, and FAA4 is in long-chain-fatty-acid-CoA ligase activity. These two proteins are located differently and share no common function, raising a question mark on whether they indeed interact. False positive interactions could arise from several reasons, such as autoactivation of reporter transcription by the bait protein alone. We suggest that a small-scale experiment should be carried out on this specific protein pair.
Studies that include both experimental data and computational simulations (in silico) have increased in number because the techniques are complementary. In silico methodologies are currently an essential component of drug design; moreover, identification and optimization of the best ligand based on the structures of biomolecules are common scientific challenges. Geometric structural properties of biomolecules explain their behavior and interactions and when this information is used by a combination of algorithms, a dynamic model based on atomic details can be produced. Docking studies enable researchers to determine the best position for a ligand to bind on a macromolecule, whereas Molecular Dynamics (MD) simulations describe the relevant interactions that maintain this binding. MD simulations have the advantage of illustrating the macromolecule movements in more detail. In the case of a protein, the side chain, backbone and domain movements can explain how ligands are trapped during different conformational states. Additionally, MD simulations can depict several binding sites of ligands that can be explored by docking studies, sampling many protein conformations. Following the previously mentioned strategy, it is possible to identify each binding site that might be able to accommodate different ligands through atomic motion. Another important advantage of MD is to explore the movement of side chains of key catalytic residues, which could provide information about the formation of transition states of a protein. All this information can be used to propose ligands and their most probable site of interaction, which are daily tasks of drug design. In this review, the most frequent criteria that are considered when determining pharmacological targets are gathered, particularly when docking and MD are combined.
Since neither MOE’s built-in automated structural modeling tool nor the Swiss-mo- del server had predicted a homology model we decided to use a manual procedure. First, the spatial coordinates of the 1IYM structure (22), which showed the highest sequence homology with the query (23% after manual editing, Table 1), was used as a rigid scaffold for the main part of the query. The side chains were interchanged (mutation simula- tion) to reflect the target sequence (Figure 1). Second, the original conformations were kept at identical positions, conserved posi- tions were mutated preferentially respecting the template orientation, whereas not con- served side chain conformations were lo- cally refined. Side chain conformations were usually constructed by means of molecular mechanics with empirical energy refinements towards local minima. Alternatively, the web- based SCWRL server or its download ver- sion could be used to predict conformations (7). The loop database of SPV was also consulted before completing the atom-scale model with MOE tools under AMBER-94 force-field conditions (29). Due to the high degree of local conservation, especially with respect to the amino acids forming the struc- tural RING motif, the backbone fold was obeyed for the entire length of 64 residues. Third, amino acids which are possible salt bridge partners due to charge and position were searched among the remaining resi-
Revealing functional units in protein-protein interaction (PPI) networks are important for understanding cellular functional organization. Current algorithms for identifying functional units mainly focus on cohesive protein complexes which have more internal interactions than external interactions. Most of these approaches do not handle overlaps among complexes since they usually allow a protein to belong to only one complex. Moreover, recent studies have shown that other non- cohesive structural functional units beyond complexes also exist in PPI networks. Thus previous algorithms that just focus on non-overlapping cohesive complexes are not able to present the biological reality fully. Here, we develop a new regularized sparse random graph model (RSRGM) to explore overlapping and various structural functional units in PPI networks. RSRGM is principally dominated by two model parameters. One is used to define the functional units as groups of proteins that have similar patterns of connections to others, which allows RSRGM to detect non-cohesive structural functional units. The other one is used to represent the degree of proteins belonging to the units, which supports a protein belonging to more than one revealed unit. We also propose a regularizer to control the smoothness between the estimators of these two parameters. Experimental results on four S. cerevisiae PPI networks show that the performance of RSRGM on detecting cohesive complexes and overlapping complexes is superior to that of previous competing algorithms. Moreover, RSRGM has the ability to discover biological significant functional units besides complexes.
Our results further show that Rep68 is functional as an octameric helicase, and we propose that both helicase rings may be active in this bidirectional complex. Although the proposed structure might have implications for our current replication model, the exact role of a double-octameric Rep68 in AAV DNA replication and/or site-specific integration remains to be deter- mined. However, several scenarios are plausible. The current model for AAV DNA replication does not envision bidirectional replication , as it has been proposed for the SV40 and papilloma viruses. These viruses have a double-stranded DNA origin that contains two inverted repeats that are both recognized by the respective initiator protein. In contrast, AAV contains a single repeat (the RBS) in each ITR. Using LTag and E1 provide as precedence, Rep68 would be expected to require two inverted repeats in order to assemble a double octamer. In view of biochemical evidence, which suggests that Rep68 can form ternary complexes with 2 AAV ITRs , the Rep68 double octamer may coordinate the resolution of two ITR molecules (as may be the case of intermolecular unwinding). Another interesting scenario is the requirement of a double octamer during the refolding of ITR structures after completion of the ITR resolution and its subsequent duplication. Interestingly, two inverted RBS sequences are obtained after these steps, and, in theory, Rep68/78 proteins have the potential to recognize them and initiate their melting, followed by the formation of a double octamer, which would not only allow the refolding of the ITR structures but also the unwinding of the AAV dsDNA required for the following rounds of replication. Identifying the exact role of the Rep68 double octamer during AAV life cycle as well as its structural characterization will help to understand how Rep68 functions during the unwinding reaction.
A scan of the PIR-NREF database (Barker et al., 2001) against the -cinnamomin sequence matched 32 sequences with signi®cant homology scores, 26 of which are unique. The sequences were aligned with the multiple alignment program CLUSTALX (Thompson et al., 1994) and a cluster analysis based on the identity matrix was performed with the MODELLER program (Sali & Blundell, 1993). Almost all primary sequences show pairwise identities above 65% and the sequence alignment reveals several highly conserved regions (residues 7±9, 15±20, 31±38, 40±43, 50±51, 53±56, 69±85 and 95±98; Fig. 1). This high level of identity suggests a very similar three-dimensional structure and makes the elicitins excellent candidates for homology modelling (Marti-Renom et al., 2000).