The results presented in this work can be considered as a strong argument in favor of using Local Rank Distance for computational biology tasks, in order to obtain results that are often more accurate from a biological point of view. Local Rank Distance  is related to the rearrangement distance . The rearrangement distance works with indexed k-mers and is based on a process of converting a string into another, in a similar fashion to the edit distance. Unlike the edit distance or the rearrangement distance, LRD does not impose such global constraints. Instead, LRD tries to capture only the local changes in DNA. This seems to be more natural from an evolutionary point of view, since changes in DNA, such as point mutations or indels, occur at the local level. Perhaps this is the key insight of why Local Rank Distance should be expected to give more accurate results than the other distance measures. For instance, the edit distance counts the minimum number of operations required to transform one string into the other. It is clear that the actual number of DNA changes that did occur may be higher than the minimum number of operations. The Hamming distance sides with Local Rank Distance regarding the local aspect. However, the Hamming distance is greatly affected by indels. A single character that is inserted (or deleted) into one of the two strings will damage the Hamming distance computation for the rest of string. On the other hand, Local Rank Distance is more robust to changes such as indels or duplications, since it sums up the positional offsets of identical k-mers. When two DNA sequences are identical, the positional offsets of identical k-mers sum up to zero. If the two DNA sequences are affected by various types of DNA changes, the positional offsets of identical k- mers increase mostly in the affected DNA regions. Consequently, the Local Rank Distance will be higher, since it finds displaced k- mers. When more point mutations, indels, reversals or other kinds of errors occur in the DNA, LRD will indicate an even higher distance between the DNA sequences. Intuitively, Local Rank Distance reflects the total amount of local changes between two DNA sequences. This intuition can be better observed in Figure 3, which shows how the Local Rank Distance between two DNA sequences changes when one of the two sequences is affected by different types of DNA polymorphisms. Another key insight of why the rank-based approach should work better is that Local Rank Distance can capture very fine differences between strings, unlike
Phylogenetic relationships inferred using ssrRNA, gGAPDH and cytB generated trees with similar topologies and were also congruent with results based on cytB sequences. Three major clades of bat trypanosomes within the subgenus Schizotrypanum were strongly supported in all phylogenies regardless of data sets and analytical methods in which the clade containing T. cruzi was closer to that containing T.c. marinkellei than to T. dionisii. No other species of Schizotrypanum besides these species before mentioned were isolated from bats in this study, suggesting that other species of this subgenus are rare in this area of Bolivia and/or difficult to cultivate. Closest to the T. cruzi clade is T. rangeli, another American trypanosome of wild mammals also transmitted by triatomine bugs but rarely found in bats, except in Brazil. Only two cultures of T. rangeli from bats have been confirmed using morphological, biological and molecular parameters . Phylo- geographical, ecological and biological analyses of isolates classified as Schizotrypanum disclosed some patterns of association with bat species, biomes and geographic origin, as well as with their behavior in culture, triatomine bugs and mice. Our results show overlapping geographic areas of the two Schizotrypanum
Tip100 is an Ac-like transposable element that belongs to the hAT superfamily. First discovered in Ipomoea purpurea (common morning glory), it was classified as an autonomous element capable of movement within the genome. As Tip100 data were already available in databases, the sequences of related elements in ten additional species of Ipomoea and five commercial varieties were isolated and analyzed. Evolutionary analysisbased on sequence diver- sity in nuclear ribosomal Internal Transcribed Spacers (ITS), was also applied to compare the evolution of these ele- ments with that of Tip100 in the Ipomoea genus. Tip100 sequences were found in I. purpurea, I. nil, I. indica and I. alba, all of which showed high levels of similarity. The results of phylogeneticanalysis of transposon sequences were congruent with the phylogenetic topology obtained for ITS sequences, thereby demonstrating that Tip100 is re- stricted to a particular group of species within Ipomoea. We hypothesize that Tip100 was probably acquired from a common ancestor and has been transmitted vertically within this genus.
In the case of the above experiment, DendroBLAST was compared to other inference methods using simulated multiple sequence alignments with addition of alignment induced error. It is common inphylogeneticanalysis for alignments to be subject to trimming before use. Trimming removes positions which are suspected to contain mis-aligned sequence and hence could lead to phylogenetic error. However, trimming also reduces the amount of data available to make the inference and hence can negatively affect phylogenetic inference through data reduction. Here a commonly used package for alignment trimming GBLOCKS  was used to trim the re-aligned multiple sequence alignments using a conservative (less data removed) setting. In all cases trimming the re-aligned multiple sequence alignments resulted in reduction of inference performance (Table 3) using alignment based methods. This effect was more pronounced on the alignments which contained higher error rates (Table 3). This result agrees with similar findings which suggest that removing data using methods like GBLOCKS does not always improve the accuracy of phylogenetic inference [40,41].
Direct sequenceanalysis of a sufficiently long portion of the NS5B gene followed by phylogeneticanalysis is the reference method for identification of HCV genotype and subtype [1,25]. It was used to identify the HCV genotype and subtype in 516 treatment-naı¨ve patients included in a multicenter clinical trial assessing different schedules of pegylated IFN-a2a and ribavirin . All of these patients were thought to be infected with HCV genotype 1 at inclusion based on local assessment. In fact, 6 patients were infected with genotype 6, including 2 with subtype 6e, one with subtype 6o, one with subtype 6p, one with subtype 6q and one with subtype 6r. These 6 samples were not considered for further analysisin the present study. The remaining 510 patients were confirmed to be infected with HCV genotype 1: 237 of them (46.5%) were infected with HCV subtype 1a and 263 (51.6%) with subtype 1b (Figure 1). As shown in Figure 1, HCV subtype 1a strains segregated into two distinct clades, that were termed 1a clade I (n = 83, 35.0%) and 1a clade II (n = 154, 65.0%). Eight patients (1.6%) were infected with another HCV genotype 1 subtype, including 4 patients with subtype 1d, 2 with subtype 1e, one with subtype 1i, and one with subtype 1l. The remaining 2 patients (0.3%) were infected with genotype 1 but the subtype could not be determined. The ability of the different molecular methods to correctly identify HCV subtypes 1a and 1b was then tested on the 237 and 263 samples containing HCV subtypes 1a and 1b, respectively.
The reconstructed neighbor-joining (NJ) phylo- genetic tree, based on the 303 nt L segments of 7 ex- amined DOBV sequences, compared sequences orig- inating from different European countries, including Greece, Slovenia, Slovakia, Germany, Russia, Estonia and Serbia (Fig. 1). In the NJ tree, the newly detected sequence, isolated from A. agrarius, was placed to- gether with sequences isolated from A. flavicollis with 99% bootstrap support. The other three sequences isolated from A. agrarius formed a distinct clade on the phylogenetic tree. The positioning of the Serbian sequence on the phylogenetic tree could possibly re- flect local host switching of DOBV between A. flavi- collis and A. agrarius. This possibility is supported by the fact that these two Apodemus species are known to share the same habitat. In addition, previous stud- ies have already described spillover for DOBV and TULV (Schlegel et al., 2009; Schmidt-Chanasit et al., 2010; Schlegel et al., 2012). In this study, rodent iden- tification was performed based on morphology. The two species in question (A. flavicollis and A. agrarius) are clearly morphologically distinct; however, genetic confirmation of species determination, based on the mitochondrial cytochrome b (cyt b) gene would be a useful complement.
clade is particularly interesting as it contains mainly animal pathogenic species. Some of them namely Coccidioides immitis have initially been classified as a protist, but further research showed it were a fungus and separate studies placed it in three different divisions of the former group named Eumycota (Rixford and Gilchrist, 1896; Ophuls, 1905; Ciferri and Redaelli, 1936; Baker et al., 1943). Subsequent ribosomal phylogeny studies (Pan et al., 1994; Bowman et al., 1996) suggested a close phylogenetic relationship between C. immitis and U. reesii, excluding H. capsulatum. The phylogenies based on Rlm1 and Mcm1 protein/gene agree with the placement of C. immitis, C. posadassii and U. reesii as sister taxa, representing two families, Gymnoascaceae and Onygenaceae. However, a conflict was observed in the placement of Ascosphaera apis, which formed a clade with Microsporum gypseum with the Rlm1 protein/gene analysis and appears as a basal taxon in Eurotiomycetes, with the Mcm1 protein analysis. The obtained results disagree with the Eurotiomycetes phylogeny study, in which Ascosphaera apis (Ascosphaeraceae) formed a sister clade in the Ajellomycetaceae (H. capsulatum, P. brasiliensis and B. dermatitidis) and Microsporum gypseum, a sister clade of Gymnoascaceae (Geiser et al., 2006). The Eurotiomycetes branch containing the Eurotiales clade inferred the close relationship among Talaromyces stipitatus and Penicillium marneffei (Figures 8, 9 and 10). A minor difference in the Aspergillus clade was observed between the phylogenetic analyses with Rlm1 and Mcm1 regarding the position of A. nidulans, A. terreus, A. niger and A. fumigatus (Figure 8, 9 and 10). In this study, Neosatoria fischeri and A. fumigatus formed well-supported sister taxa, as well as A. oryzae and A. flavus, in accordance to Fitzpatrick et al.(2006) and James et al. (2006) studies.
haemagglutinin-neuraminidase protein (571 aa) is also characteristic of virulent strains [46,60,61]. The prediction of virulence based on the F cleavage site pattern was confirmed by in vivo tests that has resulted in a high ICPI value (1.9), close to the maximum of 2, with the two strains tested (MG-1992 and MG-725/08). In spite of the established virulence of the MG-725/08 strain, it was recovered both from cloacal and tracheal swabs from an unvaccinated and apparently healthy chicken. However, the possibility that this chicken was sampled during the incubation period of the infection or was partially protected by a previous infection with an avirulent or vaccine strain that circulate in the field cannot be ruled out. Figure 2. Phylogenetic tree (unrooted) of nucleotide sequences based on a 374-nt sequence (position 47–421 nt) of the F gene. Phylogenetic relationships of MG group strains with previously published sequences in Genbank. The evolutionary history was inferred using the Neighbor-Joining method . All results are based on the pairwise analysis. Analyses were conducted using the Kimura 2-parameter method in MEGA4 [31,32] with 1,000 bootstraps . The isolates from Madagascar that were subjected to analysisin this work are in bold and red. Genotype or lineage groupings are indicated on the right.
Phylogenetic analyses were conducted on both the nucleotide coding sequences and the predicted amino acid sequences aligned with MAFFT v6.864b  using the L-INS-i strategy for accuracy. Maximum likelihood (ML) was used as the optimality criterion, and optimal nucleotide and amino acid substitution models were determined with MEGA5  and PROTTEST v2.4  and the likelihood-ratio method . Tree searches were conducted with both PAUP 4.0a123  and MEGA 5 under the Tamura-Nei 93 nucleotide substitution model (TN93; ), or either the Whelan and Goldman (WAG; ) or the Jones, Taylor and Thornton (JTT; ) amino acid substitution models. Substitution models were combined with empirical estimates of nucleotide/amino acid frequencies and, a gamma distributed among-sites rate variation, and an estimate of the proportion of invariant sites. Positions containing gaps were excluded from the analysis. Bootstrapping was performed with 500 replicates. The dataset for Figure S1, was composed of 66 glucosyltransferase/ glucansucrase sequences from bacteria in the genera Streptococcus, Lactobacillus, Leuconostoc, Oenococcus and Weisella. The dataset for Figure 1 was composed of 39 streptococcal glucosyltransferase sequences representing 16 species. The tree was constructed based on the predicted amino acid sequence of the conserved catalytic domain of Gtfs (positions 166–934 in S. mutans GtfB) and rooted with the dextransucrase DsrP from L. mesenteroides, chosen as an outgroup on the basis of its position in the tree presented in Figure S1. A consensus tree was generated by collapsing the branches with less than 50% bootstrap support, and thus the branch lengths are not shown.
Although tyrosine phosphorylation reportedly occurs at any of the three motifs (TPM-A, TPM-B and TPM-C), detailed sequence-analysis of individual distribution proved to be of no prognostic value, as no site-specific mu- tation in any of the three tyrosine phosphorylation motifs was observed to be directly associated with disease status (Figure S2 a, b, c), thereby implying the outcome to be TPM-independent. The presence of KNEPIY in the place of KNS(T/g)EPIY at site 899 in the strains studied, requires further investigation on large number of samples from dif- ferent geographical locations of the Indian subcontinent (Figure 3). These observations are in absolute conformity with those reported by Owen et al, (2003), who demon- strated there to be no association of the number and type of TPMs present, with the severity of the disease. Nonethe- less, they reported relatively lower frequencies in all the three TPMs (Owen et al., 2003) than was the case in the present study. Another study from Costa Rica (Occhialini et al, 2001), reported relative frequencies of 100% and 58% for TPM A and B, respectively, but were unable to detect the TPM C motif in any of the strains studied. The reason for such discordance between studies is unclear thereby warranting detailed investigation of large clinical isolates from several geographical areas. Although, according to MJ-networks, TPM genes may not be pathogenetically rel- evant, an attempt was made to understand whether haplo- types play any specific, associated role in altering the outcome of a disease. Our high-resolution study based on MJ networks in 32 H. pylori strains showed this was not so.
The amplification products corresponding to ITS region of rDNA of all Colletotrichum spp. isolates were purified by using the kit QIAquick Gel Extraction Kit 250 (Qiagen, Hilden, Germany). The sequencing reactions of 5.8S-ITS region were carried through with kit ABI PRISM Big Dye Terminator Cycle Sequencing Ready Reaction v.3.1 (Applied Biosystems, Foster City, California, the USA) and processed in the automatic DNA ABI PRISM 3100 Genetic Analyser (Applied Biosystems - Hitachi). After ITS region sequence, in silico comparison was done in which, the sequences obtained were edited by software SeqScape 2.5 of the Applied Biosystems and were used to search for similar sequences by using Blast N software (ALTSCHUL et al., 1990) of the NCBI (http:// www.ncbi.nlm.nih.gov). The edited sequences were then lined up in ClustalW software and a Neighbor- Joining phylogenetic tree was constructed from Kimura 2-parameter pairwise distances using the Molecular Evolutionary Genetics Analysis software MEGA 3.1 (SAITON & NEI, 1987). The consistency of phylogenetic resolution was supported by bootstrap analysis using 1.000 replicates. The DNA sequences obtained from Colletotrichum isolates and from GenBank were used for phylogeneticanalysiswith a preliminary species identification performance based on the best matches from GenBank accessions in order to estimate the ability of the ITS sequence to accurately pinpoint species identity using database matches, e- values, similarity scores and through the distance tree of search results constructed by the GenBank interface (minimum evolution option).
The last twelve figures all relate to publications, which is the most important element used to evaluate the academic ranking. Elements such as number of publications, citations, citations per documents, impact factor, h-index and so on all relate to publication and they evaluate publications both quantitatively and qualitatively. As already discussed in the introduction, there are other important elements that help us evaluate the academic level of organizations or countries, such as the number of doctoral degrees, the number of academic staff, institute income, research income, and international collaboration by that organization or country. In this section we discuss these issues in detail.
phicoerythrin. These structures are responsible for absorbing energy from light that is transferred to Chlorophyll molecules (Glazer, 1985; Mullineaux, 2008). Therefore, phycobilisome structure determines the light spectra that can be used by a given organism, and consequently its capacity to photosynthesize at different environments. Eleven marine Synechococcus strains have had their phycobilisome structures analyzed and compared, revealing that even within this group of closely related organisms there is a remarkable diversity regarding their light-absorption apparatus (Six et al., 2007b). The functioning and tolerance of fluctuations in irradiance of the light harvesting apparatus of lineages WH8102, RS9917 and RCC307 has been shown to be distinct between them and also different from that of Prochlorococcus. These differences are thought to be associated with niche-partitioning between these organisms, that make use of distinct light spectra for photosynthesis (Six et al., 2007a).
Bacterial expression - The DNAs encoding the com- plete VP1 (the major viral antigen), parts of 2A and VP3 HAV proteins were cloned and expressed in E. coli. A 1.2 Kb fragment (amino acid 461-860) was obtained by enzy- matic digestion from the 42pGEMVP1 clone. After diges- tion with Eco RI and Hind III (restriction sites present in plasmid pGEM2), the fragment was inserted into the ex- pression vector pET28a (Novagen). The resulting plas- mid was referred to as VP1pET42u. In addition, a vector control of the expression procedure, plasmid VP1pET85t, was employed. It constituted another pET construct with the same fragment, not in frame. For bacterial expression of the recombinant protein, E. coli strain BL-21 (DE3) was transformed using the pET28a constructs. The cultures were incubated at 37°C in Luria broth, containing kanamicin (50 µg/ml) until OD 600 = 0.8. For the induction step, we added isopropyl-β-D-thiogalactopyranoside (IPTG) to a final concentrations of 0.2 mM. The cultures were incu- bated overnight at 39°C. Bacteria were pelleted by cen- trifugation, ressuspended and extracted by ultrasonication using 0.1 mM of lisozyme. The proteins were precipitated by acetone, solubilized with 6M urea and concentrated with 30% ammonium sulphate. The expressed recombi- nant protein was submitted to electrophoresis, in a 12% polyacrilamide gel (SDS-PAGE) and tested by Western blot, utilizing a human convalescent-phase anti-HAV serum.
Predatory bacteria seek and consume other live bacteria. Although belonging to taxonomi- cally diverse groups, relatively few bacterial predator species are known. Consequently, it is difficult to assess the impact of predation within the bacterial realm. As no genetic signa- tures distinguishing them from non-predatory bacteria are known, genomic resources can- not be exploited to uncover novel predators. In order to identify genes specific to predatory bacteria, we developed a bioinformatic tool called DiffGene. This tool automatically identi- fies marker genes that are specific to phenotypic or taxonomic groups, by mapping the com- plete gene content of all available fully-sequenced genomes for the presence/absence of each gene in each genome. A putative ‘predator region’ of ~60 amino acids in the trypto- phan 2,3-dioxygenase (TDO) protein was found to probably be a predator-specific marker. This region is found in all known obligate predator and a few facultative predator genomes, and is absent from most facultative predators and all non-predatory bacteria. We designed PCR primers that uniquely amplify a ~180bp-long sequence within the predators’ TDO gene, and validated them in monocultures as well as in metagenetic analysis of environ- mental wastewater samples. This marker, in addition to its usage in predator identification and phylogenetics, may finally permit reliable enumeration and cataloguing of predatory bacteria from environmental samples, as well as uncovering novel predators.
Further analysis of the calculated EFMs shows that only approximately 1/50th of the original set of EFMs is still fully active after CHLAMY1 binding. Therefore, the metabolic flux through the system is considerably reduced during the night-time, which is in line with the reduction of carbon and energy consumption when photosynthesis is inactive. This holds independently of the carbon source chosen. Particularly EFMs with a low yield are suppressed, so that the average yield increases. If CHLAMY1 binding is reduced at the end of the night resulting in the expression of target enzymes at the beginning of the day when photosynthetic energy is again available, the metabolic capability and robustness of nitrogen metabolism is greatly increased and allows fast incorpo- ration of nitrogen into the organism. As energy is no longer limiting, there is no need to restrict to those reactions with high yields and low energy consumption. Therefore, CHLAMY1 binding during the night appears to ensure energy conservation while still allowing nitrogen fixation. Due to the stabilisation of mRNA by CHLAMY1 and release at the end of the night [23,25,26], it furthermore enables a high metabolic capacity as soon as enough energy is available.
an estimate of the performance of the second stage, we made a study in which we executed the comparison of the 20 top profile HMMs from the PFam-A  database with our 4 sets of query sequences to obtain both the similarity score and the divergence data for them. Then we built a graph plotting the similarity score threshold and the number of sequences with a similarity score greater than the threshold. From this graph we learned that less than 1% of the sequences were considered significant, even relaxing the threshold to include very bad alignments. With this information, we plotted the percentage of the DP matrices that the second stage of the system will have to reprocess in order to find out the worst case situation and make our estimations based on it. From Figure 17 we can see that, for the experimental data considered, in the worst case the divergence region only corresponds to 22% of the DP matrices.
A HIV is human immunodeficiency virus causes AIDS (Acquired Immunodeficiency Virus)  which leads to life threatening opportunistic infections. It is one of the most serious, deadly diseases in human history. In the last two decades, over more than 60 million people have been infected with HIV. After getting into the body, the virus kills or damages cells of the body's immune system. The body tries to keep up by making new cells or trying to contain the virus, but eventually the HIV wins out and progressively destroys the body's ability to fight infections and certain cancers.HIV is of two types HIV-1 and HIV-2[1,2]. HIV is different in structure from other retroviruses. It is roughly spherical with a diameter of about 120nm, around 60 times smaller than a red blood cell, yet large for a virus. It is composed of two copies of positive single-stranded RNA enclosed by a conical capsid comprising the viral protein p24, typical of lentiviruses. HIV contains nine gene made of 9749 base pairs.
To detect antibodies against polyomaviruses, virus-like particle- based enzyme immunoassays similar to the assays developed for MCPyV, LPyV and HPyV9 were used [24,28]. Briefly, ELISAs were performed in microplates coated with 100 ng of extract enriched for VLPs. VLP concentrations were determined using the Qubit Protein Assay Kit (Invitrogen). Sera were tested at 1:100 dilution and peroxidase-conjugated goat anti-human IgG (South- ern Biotech, Clinisciences, Nanterre, France) diluted 1: 20,000 was used to detect binding of human and great ape IgGs. The cut-off values were set at 0.200, determined as previously [24,28].
The low yield of chitinase-producing strains is one of the major problems in the study and application of chiti- nases. The production of recombinant proteins in active form is therefore of great importance for studying the appli- cations of these enzymes. Expression systems based on E. coli have been commonly used to express heterologous proteins because this bacterium is well characterized with regard to its molecular genetics, physiology and availabil- ity of different expression systems (Balbás, 2001). In this study, PCR primers specific for the chiC gene were used to amplify a chitinase gene fragment and to confirm the pres- ence of the PsChiC gene in Pseudomonas strain TXG6-1. The chitinase PsChiC gene was transformed in an expres- sion vector pET30a to construct the recombinant expres- sion plasmid, pETPsChiC. The fusion protein PsChiC-His was successfully expressed in E. coli cells. The ability to produce this chitinase in E. coli will facilitate large-scale production and subsequent structural and functional studies of this protein.