• Nenhum resultado encontrado

DNA Metabarcoding

No documento Tiayyba Riaz (páginas 30-34)

In the above sections we have shortly talked about DNA metabarcoding and environ- mental samples. In this section we will clearly define these terms and talk about them in detail. First we will see what is an environmental sample because metabarcoding is based on the use of such samples.

An environmental sample is a mixture of some organic and inorganic materials taken from environment, for example a water sample taken from deep sea to study biotic communities or soil sample taken from an ecosystem to study species diversity or feces sample to study diet of certain animal species. These type of samples can contain live micro-organisms or small macro-organisms such as nematodes or springtails and remains of dead macro-organisms present around the sampling site. This DNA can be extracted and albeit partially degraded, short sequences can be amplified and sequenced. Soil and deep sea water samples represent a potential information source about all organisms

living in them, and these samples can be used to have an overview of organisms’ diversity by using metabarcoding approaches.

At the beginning most of the studies done on environmental samples were focused on microbial communities (Herreraet al.,2007,Vicenteet al.,2007,Zingeret al.,2008). In this case DNA sequences of several hundreds of base pairs can be retrieved because DNA of good quality is extracted from live microorganisms. However, environmental samples can also be used for characterizing the diversity of macro-organic species such as plants or animals in an ecosystem, where DNA comes from dead macro-organisms, and in most cases it is highly degraded. In this case only short sequences can be amplified.

DNA metabarcoding or environmental barcoding corresponds to the identification of any taxonomic level (not restricted to species level) using any suitable DNA marker (and not just the standardized markers). Thus the identification of genera or families, from an environmental sample using a suitable short DNA fragment that has not been recognized as the standardized barcode, can be considered as DNA metabarcoding. Metabarcoding requires DNA extraction from an environmental pooled sample, PCR amplification from a mixture of degraded DNA samples, sequencing large numbers of DNA barcodes using high-throughput sequencing techniques and the analysis of this huge amount of sequence data. DNA metabarcoding, thus has the potential to provide the accurate measures of genetic richness in the quantitative samples taken at each sampling point.

1.5.1 DNA Metabarcoding With New Sequencing Techniques

Classical barcoding system is based on Sanger sequencing approach (Sangeret al.,1977) and can target single specimens. Sanger sequencing yields a read length of 800−1000 bp.

This approach is not feasible for environmental samples where mixtures of organisms are under investigation. However, recently next-generation sequencing systems have become available (Hudson,2008,Schuster,2008). These new sequencing technologies can aid in directly analyzing biodiversity in bulk environmental samples through their massively parallelized capability to read thousands of sequences from mixtures (Hajibabaeiet al., 2009).

This new, fast and cheap DNA sequencing in short segments is the most innovative recent development. Several new sequencing techniques have been developed which are based on methods that parallelize the sequencing process allowing the simultaneous sequencing of thousands or millions of sequences at once (Church,2006,Hall,2007). These sequencing methods include the 454 implementation of pyrosequencing, Solexa/Illumina reversible terminator technologies, polony sequencing and AB SOLiD. The typical read length of 454

GS FLX/Roche is 500 bp, for Solexa/Illumina it is 100 bp, and for polony sequencing and AB SOLiD it is 25−50 bp. The enormous amount of relatively long sequences produced by 454 GS FLX/Roche and Solexa/Illumina, make these new sequencers suitable for environmental barcoding studies where scientists have to deal with complex samples composed of a mixture of many speciese.g.deep sea biodiversity (Soginet al.,2006) and diet analysis (Shehzadet al.)(submitted).

1.5.2 Barcode Designing For Metabarcoding Applications

Having talked about the usability of DNA metabarcoding and its vitality in ecological studies, now the question arises how can we successfully use this approach? Consider- ing the broader view of metabarcoding and its applications in the field of biodiversity, forensics, diet analysis and paleoecological studies which are based on the analysis of environmental samples, it is easy to conclude that standard barcode markers as defined by CBoL are not suitable for metabarcoding studies. In order to perform DNA metabarcod- ing effectively the first step of a metabarcoding study should be the selection of best DNA regions to be used as barcodes considering the aim of the study. It has been suggested that shorter barcoding markers should be used (Taberletet al.,2007). However before talking about the design of barcode markers, we need to know what are the properties of an ideal barcode marker.

According to both theoretical and experimental points of view, an ideal barcode marker should fulfill the following properties (Valentiniet al.,2009).

• The DNA region selected as barcode should be nearly identical among individuals of the same species, but different between species, giving it a strong discriminating power.

• It should be standardized as defined by CBoL so that the same DNA region could be used for different taxonomic groups.

• The target DNA region should contain enough phylogenetic informationi.e. the level of divergence between these reference sequences reflects the level of divergence between actual species so that unknown or not yet “barcoded“ species could be easily assigned to their respective taxonomic group (genus, family,etc.).

• It should be flanked by two highly conserved regions from one species to another to allow amplification of the fragment by PCR in as many species as possible, thus ensuring a good taxonomic coverage. This is particularly important when using environmental samples, where each extract contains a mixture of many species

to be identified at the same time. This property is also important for simplifying PCR amplification conditions to reduce disequilibrium in amplification amongst the different DNA templates and to avoid the production of possible chimeric products.

• The target DNA region should be short enough to allow amplification of degraded DNA. Usually, DNA regions longer than 150 bp are difficult to amplify from de- graded DNA.

Taking into account the scientific and technical contexts, the various categories of users (e.g.taxonomists, ecologists,etc...) will not give the same priority to the five criteria listed above. The first three criteria are the most important for taxonomists (DNA barcoding sensu stricto), whereas ecologists working with environmental samples will favor the last two criteria. Unfortunately there exist no such markers with these properties suitable for metabarcoding applications. Moreover different metabarcoding applications may need different barcode markers. In the following subsection we will see that how can we efficiently design barcode markers specific to a particular application considering the aims of the study.

Barcode Design Workflow

In order to design the barcodes which are most relevant to a particular study, we can make use of the large public databases of sequences that exist today (Ficetolaet al.,2010).

We can perform a database search to extract sequences that belong to a targeted organism or taxa. Mostly sequences are downloadable fromGenBank,EMBLorDDBJ. In order to search the relevant sequences for a particular study, for example fromNCBIs GenBank, BLAST program (Altschul et al.,1997) can be used. For downloading the sequences, NCBIhas provided the utility ofEntrez(Wheeleret al.,2006) which is web-based search and retrieval system for major databases. Once we have our target sequences as input, we can identify conserved regions shared by these sequences in order to design barcode markers. Finally the selected conserved regions need to be checked against certain criteria to be used as PCR primers and eventually as barcode markers.

In all these steps, finding conserved regions (also called repeated patterns) is the most challenging task. It is an important and widely studied problem in computational molec- ular biology and there exist a number of different computer science techniques to find such regions.

No documento Tiayyba Riaz (páginas 30-34)