• Nenhum resultado encontrado

4. Methods for identifying and validating miRNA targets

4.1. Predictions methods of miRNA targets

4.1.1. Computational methods

Numerous algorithms have been developed in order to predict miRNA targets (Table 1).

These programs take into account diverse criteria, most of them relying on empirical-derived conclusions aiming at reducing false-positive predictions or increasing the signal-to-noise ratio. These criteria consist in elements such as base pairing pattern, thermodynamical stability of the miRNA-target RNA duplex, comparative sequence analysis to check conservation, and search for multiple target sites (Figure 13B) (reviewed in (Min and Yoon, 2010; Watanabe et al., 2007)).

Base-pairing criteria rely on the search of the specific patterns that have been described for characteristic miRNA binding sites, and rare are the bioinformatic tools, which do not include requirement for a perfect seed-match (Bartel, 2009; Min and Yoon, 2010; Pasquinelli, 2012;

Watanabe et al., 2007). However, some algorithms such as miRanda (John et al., 2004), or DIANA-microT (Kiriakidou et al., 2004) do allow to define weak binding of the miRNA 5' end, or a tolerance for G:U wobble in the seed-match. The next aspect taken into account is the thermodynamic property of the miRNA-mRNA duplex, obtained by prediction of the free

energy, as the data sets of known miRNA-mRNA duplexes are limited, and as low free energy (stable binding) is not always a reliable prediction of miRNA target genes. Therefore, it is necessary to consider other features (Watanabe et al., 2007). Indeed, it has been shown in a study that thermodynamic restrictions can be removed without lowering the specificity of the algorithms, by incorporating evolutionary conservation derived from multiple sequence analysis (Lewis et al., 2005). Evolutionary conservation of the miRNA binding sites within 3'UTR of closely related species is widely used by algorithms to refine the list of predicted targets (Bartel, 2009; Pasquinelli, 2012; Watanabe et al., 2007). Algorithms thus in a first step identify orthologous 3'UTR sequences and then perform conservation analysis across related species (Watanabe et al., 2007). Finally, as discussed in section 2.2.1.4, one target gene is often targeted by multiple miRNAs. Based on this observation, Starck et al. proposed to check for multiple target sites and therefore to take in account the number of target sites for predictions (Stark et al., 2005), a feature used by many of the existing algorithms (Bartel, 2009), and which is one of the two basic requirements of the PicTar alogorithm (Krek et al., 2005).

In addition, some algorithms allow the search for non-conserved binding sites, since a lot of them have been demonstrated to exist and some of them could thereby represent important species-specific repression (reviewed in (Bartel, 2009)), as for example the targets of the newly emerging species-specific miRNAs (discussed in section 2.2.2.5). Additional criteria are considered by some of the bioinformatics predictions tools, such as the suggestion by Robin et al. to consider the folded structure of the mRNA for miRNA accessibility. This approach also allows to ignore evolutionary conservation requirements (Robins et al., 2005).

The PITA algorithm (probability of interaction by target accessibility) has been designed following this consideration (Kertesz et al., 2007). An alternative method consists in analysing the sequence context surrounding the site: the TargetScan algorithm searches for non-conserved binding sites surrounded by AU-rich regions (Grimson et al., 2007).

Some databases also propose a combination of the target predictions with other features, such as for example, expression profiling of both miRNAs and their putative mRNA targets according to cell and/or tissue specificity, functional annotations of the predicted targets, and/or the cellular pathway in which they are involved (reviewed in (Min and Yoon, 2010;

Watanabe et al., 2007)).

While most of the algorithms rely on the criteria described above, another type of algorithm as been developed by Elefant et al., which does not take into account evolutionary conservation and conventional binding patterns of characteristic miRNA target sites (e.g.

perfect seed-match). This algorithm, RepTar, relies on the identification of repetitive elements in 3'UTRs, which are then tested to identify miRNAs that can base-pair with them through thermodynamical stable interaction. Once the list of these repetitive binding sites is established, the algorithm scans the 3'UTRs (or even CDS if desired), to search for single occurrence of these sites. Therefore, RepTar allows the prediction of conventional, but also non-conventional targets sites, such as seed-match with G:U wobbles, 3' compensatory sites with mismatches in the seed-pairing, or also the recently characterised centered sites (Elefant et al., 2010).

Tool Clades Criteria for Prediction and Ranking Reference

Site Conservation Considered

TargetScan Mammalian/vertebrate Stringent seed pairing, site number, site type, site context (which includes factors that influence site accessibility); option of ranking by likelihood of preferential conservation rather than site context

(Friedman et al., 2009)

TargetScan Fly, worm Stringent seed pairing, site number, site type (Ruby et al., 2006, 2007)

EMBL Fly Stringent seed pairing, site number, overall predicted pairing stability

(Stark et al., 2003)

PicTar Mammalian/vertebrate, fly, worm

Stringent seed pairing for at least one of the sites for the miRNA, site number, overall predicted pairing stability

(Krek et al., 2005)

Miranda Mammalian/vertebrate, fly, worm, others

Moderately stringent seed pairing, site number, pairing to most of the miRNA

(John et al., 2004)

miRBase Targets

Mammalian/vertebrate, fly, worm, others

Moderately stringent seed pairing, site number, overall pairing

(Griffiths-Jones et al., 2008)

PITA Top Mammalian/vertebrate, fly, worm

Moderately stringent seed pairing, site number, overall predicted pairing stability, predicted site accessibility

(Kertesz et al., 2007)

Site Conservation Not Considered

TargetScan Mammalian/vertebrate Stringent seed pairing, site number, site type, site context (which includes factors that influence site accessibility)

(Grimson et al., 2007)

PITA All Mammalian/vertebrate, fly, worm

Moderately stringent seed pairing, site number, overall predicted pairing stability, predicted site accessibility

(Kertesz et al., 2007)

RNA22 Mammalian/vertebrate, fly, worm

Moderately stringent seed pairing, matches to sequence patterns generated from miRNA set, overall predicted pairing and predicted pairing stability

(Miranda et al., 2006)

Table 1: Comparison of tools for predicting animal miRNA targets. Adapted from (Bartel, 2009).

A strong bias subsists in most of the algorithms, as the majority of them are restricted to the analysis of 3'UTRs and do not include prediction of binding sites in the CDS or even in 5'UTRs; a feature which is more or less linked to the historical identification of the first target sites (discussed in section 2.2.1.4), and to the evolutionary conservation pattern of UTRs

and recent studies, through the use of the algorithm rna22, have shown that miRNAs can extensively target CDS (Lal et al., 2008; Tay et al., 2008a, 2008b) (Miranda et al., 2006).

Computational predictions therefore represent trivial methods to predict a large number of miRNA targets, but they are still not perfect, and most algorithms produce widely divergent predictions with various degrees of false positives and false negatives that are difficult to determine (Min and Yoon, 2010; Thomson et al., 2011). Thus, to definitely assess the predicted target as real targets, they need experimental biological validation, which are by contrast challenging and labour-intensive. One intermediary step to scale down the number of false-positive predictions is the use of large-scale experimental approaches, such as genome- wide analyses, or biochemical methods. However, bioinformatics analyses remain in most of the cases an inevitable step for miRNA target prediction, as they are part of the rare methods allowing determination of the precise binding sites. Therefore, it has to be kept in mind that by contrast with computational approaches, most of the experimental methods described below are not sufficient to establish a list of putative miRNA targets, and generally need to be combined with bioinformatics.