Exploring Polar
Microbiomes as Source
of Bioactive Molecules
Adriana Isabel Correia Rego
Mestrado em Biologia Celular e Molecular
Departamento de Biologia da Faculdade de Ciências da Universidade do Porto
Dissertação de Mestrado 2016/2017
Orientador: Pedro Leão, Investigador FCT, Centro Interdisciplinar de Investigação Marinha e Ambiental (CIIMAR)
Co-orientador: Catarina Magalhães, Investigador FCT, Centro Interdisciplinar de Investigação Marinha e Ambiental (CIIMAR) e Professora Auxiliar Convidada FCUP
Todas as correções determinadas pelo júri, e só essas, foram efetuadas. O Presidente do Júri,
Agradecimentos
Antes de mais, quero agradecer aos meus pais, sem o apoio dos quais nada disto se teria tornado possível.
Agradecer aos meus orientadores, Pedro Leão e Catarina Magalhães, pela confiança depositada, por todos os conhecimentos partilhados e tempo despendido e pela oportunidade de fazer um trabalho de investigação desafiante e concretizador.
Quero também agradecer à Teresa Martins, António Sousa e Inês Ribeiro, companheiros desta jornada, por toda a partilha de bons momentos e entreajuda. Em especial à Teresa pela amizade e ajuda incansável na química, ao António pela ajuda e paciência na análise bioinformática e à Inês, pela companhia e ajuda na realização dos ensaios.
Agradecer à equipa do laboratório Ecobiotec, em especial à Fátima Carvalho e à Mafalda Baptista pela ajuda nos isolamentos bacterianos. Ao grupo de Bioinformática, à Maria Paola e ao António, por todos os ensinamentos e momentos de boa disposição. A toda a equipa do BBE, particularmente ao João, à Raquel e ao Vítor pelo fornecimento das estirpes da coleção de culturas e DNAs. Ao Tiago por toda a ajuda fornecida na realização dos ensaios de citotoxicidade. Também ao Jorge, pela companhia nos almoços e por ter sempre uma palavra de incentivo.
Por fim, agradecer ao Alfredo, pelo apoio incondicional.
Agradeço também ao NORTE2020, Fundo Europeu de Desenvolvimento Regional (FEDER), programas estruturados R&D&I MarInfo - NORTE-01-0145-FEDER-000031 e R&D&I INNOVMAR - NORTE-01-0145-FEDER-000035, NOVELMAR e ao Programa Polar Português (PROPOLAR) pelo financiamento.
Resumo
Com o aumento da incidência de estirpes bacterianas multirresistentes aos antibióticos e doenças como o cancro, existe uma necessidade imperativa de encontrar novos potenciais fármacos. Os produtos naturais de origem microbiana estão na base de uma séride de fármacos de grande importância, utilizados atualmente para combater uma grande variedade de enfermidades. As estratégias adotadas mais recentemente para a pesquisa de novas moléculas assentam na análise genética de clusters de genes biossintéticos ou genomas, assim como na pesquisa de genes biossintéticos (em particular do tipo PKS e NRPS), em combinação com uma pesquisa guiada pela bioatividade e estrutura. O estudo de microorganismos que habitam ambientes extremos é igualmente uma estratégia promissora, dado que é expectável que grande parte dos microorganismos sejam ainda desconhecidos e possuam estratégias adaptativas únicas ao seu habitat, incluindo a produção de novas moléculas.
Neste trabalho, conjugamos ambas as estratégias, pesquisa de genes biosintéticos e testes de bioatividade para avaliar o potencial bioativo de um microbioma polar, os Vales Secos da Antártida. Dois objetivos principais foram definidos:
(1) O desenho de primers capazes de amplificar os domínios biosintéticos, KS e A, dos genes PKS e NRPS, respetivamente.
(2) A análise da diversidade microbiana de dados de pirosequenciação, assim como o isolamento de microorganismos de amostra ambientais da Antártida, e uma triagem do potencial bioativo dos isolados através da realização de bioensaios.
Dois novos pares de primers foram desenvolvidos neste trabalho, capazes de eficientemente amplificar os domínios biossintéticos (KS e A) de estirpes bacterianas pertencentes a pelo menos, os filos de Actinobactéria, Cianobactéria, Proteobactéria e Planctomicetes, úteis para triagem de isolados bacterianos em grande escala e estudos de bioprospecção de metagenómica. Caso validações futuras comprovem a sua eficiência, tais primers poderão vir a tornar-se um novo padrão para bioprospecção metagenómica com base em PCR.
As amostras ambientais da Antártida revelaram uma grande diversidade de filos quimicamente prolíficos, em particular de actinobactérias e cianobactérias. Foram obtidos isolados bacterianos dos filos Actinobacteria, Firmicutes e Proteobacteria e ainda espécies de fungos. Verificou-se que muitos dos isolados demonstravam
bioatividade em diferentes ensaios, em particular um extrato fraccionado com propriedades antimicrobianas produzido por um fungo do género Penicillium. Duas potenciais novas espécies de dois géneros diferentes são apresentadas e que têm também capacidade genética de produção de metabolitos secundários.
Palavras-chave
Produtos naturais, metabolismo secundário, primers, diversidade microbiana, PKS, NRPS, bioatividade
Abstract
With the increase in incidence of antibiotic multi-resistant bacterial species and diseases as cancer, there is an urgent necessity to find new potential drugs. Microbial natural products have yielded a variety of currently used pharmaceutically important compounds. Presently the strategies adopted to find new molecules rely on the genetic analysis of the biosynthetic gene clusters/genomes as well as gene mining (for PKS and NRPS genes), combined with bioactivity- and structure-guided discovery. Furthermore, the study of microorganisms inhabiting extreme environments is also pointed as an auspicious strategy, as it is expected that a large fraction of their microbiota is still unknown and that these organisms possess unique adaptations to their habitats, including the production of novel molecules.
Here, we combine both biosynthetic gene mining and bioactivity-guided strategies to survey the bioactive potential of a polar microbiome, the McMurdo Dry Valleys, in Antarctica.
To achieve this, two main objectives were pursued:
(1) the design of a primer pair to amplify the KS and A domain of PKS and NRPS genes from a wide range of chemically-prolific bacterial phyla and,
(2) biodiversity analysis of pyrosequencing data, isolation and growth of microorganisms from Antarctic environmental samples and screening of the bioactive potential of the isolates through in vitro assays.
Improved primer pairs, able to efficiently amplify the biosynthetic domains from
Actinobacteria, Cyanobacteria, Proteobacteria and Planctomycetes bacterial strains, at
least, were obtained, useful for large-scale screening of bacterial isolates and bioprospection in metagenomic studies. If further validation confirms the efficiency, our primers may become a new standard for PCR-based metagenomics bioprospection. Antarctic samples revealed to harbour a large diversity of prolific phyla, mainly
Actinobacteria and Cyanobacteria. Bacterial strains from Actinobacteria, Firmicutes, and Proteobacteria phyla, and Fungi strains were isolated. Bioactivity was reported for the
first time for several strains, and a potential antimicrobial compound from a fungi
Penicillium is described. Furthermore, two potential novel species from two genera are
reported and according to the biosynthetic domain mining, are worth exploring.
Keywords
natural products, secondary metabolism, primers, microbial diversity, PKS, NRPS, bioactivity
Table of contents
Agradecimentos ... iii
Resumo ... iv
Abstract ... vi
List of Figures ... x
List of Tables ... xii
List of presentations ... xiv
List of abreviations ... xv
Polyunsaturated fatty acid ... xvi
I – Introduction ... 1
1 - A new era in Natural products - Gene and Genome mining for discovery of (novel) molecules ... 4
1.1 - Culture-dependent approach ... 4
1.2 - Culture independent approach – Metagenomics ... 7
Chapter 1 - Design of primer pairs targeting the biosynthetic domains of PKS and NRPS genes in Bacteria ... 10
I – Background ... 11
II – Goals ... 11
III – Materials and Methods ... 12
1 - Design of oligonucleotide primers ... 12
1.1 -Sequence retrieval for KS Domain of Type I PKS genes and AD domain for NRPS genes ... 12
1.2 - Multiple-sequence alignment and Phylogenetic Analysis ... 13
1.3 - In silico analysis of primers reliability ... 13
1.5- Optimization of the PCR Amplification protocol ... 14
1.5.1 – Genomic DNA extraction and quantification ... 14
1.5.2. – Optimization of PCR reactions: reagent and thermal conditions ... 15
1.5.3 – Comparison of amplification results for protocols using designed primers or literature primers ... 16
1.6 Sequencing of PCR products – Test and Phylogenetic analysis ... 16
1.6.1 – Phylogenetic and NaPDoS analysis ... 17
IV – Results ... 18
V - Discussion ... 26
Chapter 2 – Biodiversity and bioactive potential of the McMurdo Dry Valleys, Antarctica ... 27
I – Background... 28
II – Goals ... 30
1–Biodiversity of a Soil Transect and of a Rock with endolithic colonization in Victoria
Valley, Victoria Land, Antarctica ... 31
1.1 - Sample collection ... 31
1.2 - eDNA Extraction and 16S rRNA gene sequencing ... 32
1.2.1- QIIME Analysis of the 16S rRNA gene ... 32
1.3 - Prediction of the microbiome metabolic capacity using PICRUSt ... 33
2- Isolation of Microorganisms from a Soil Transect and endolithic sample from Victoria Valley ... 33
2.1 – Culture strategies –Soil samples T5 and T6 ... 33
2.2- Identification of Bacterial and Fungi Isolates through 16S rRNA and ITS gene amplification and Phylogenetic analysis ... 35
2.2.1 - Identification of bacteria and Fungl isolates using FTA Indicating Micro cards (WhatmanT) ... 35
2.2.2 – Identification of bacterial isolates through 16S rRNA gene amplification .. 35
2.2.2.1 – DNA extraction ... 35
2.2.2.2 – PCR Amplification of the 16S rRNA gene ... 36
2.3– Phylogenetic analysis ... 36
3- Screening by PCR of PKS and NRPS genes in bacterial isolates ... 37
4 - Preparation of organic extracts for Bioactivity-Guided Isolation of Bioactive Molecules ... 38
4.1 – Organic extraction for Bioactivity Screenings ... 38
4.2 - Organic extraction (methanol and fractionation (VLC) from the Penicillium citrinum strain ... 39
4.2.1 – Organic extraction ... 39
4.2.2 – Fractionation of the organic extract ... 39
4.2.2.1 - Flash-chromatography of fraction 31 B ... 40
5 - Bioassays ... 41
5.1 - Antimicrobial screening susceptibility assay ... 41
5.2 - MTT Assay ... 42
IV-Results ... 43
1 - Biodiversity of a soil transect and of a rock with endolithic colonization in Victoria Valley, Victoria Land, Antarctica ... 43
1.1 - Alpha-diversity ... 43
1.2 – Beta-diversity ... 44
1.3. – Taxonomic composition ... 45
1.4 – Predicted Functional profile from 16S rRNA gene ... 47
2- Biodiversity of culturable strains from the McMurdo Dry Valleys, Antarctica ... 49
2.1 – Isolation, identification and phylogenetic analysis of obtained Isolates ... 49
2.1.2– Actinobacterial strains ... 51
2.1.3 – Proteobacteria Isolates ... 52
2.1.4– Fungi Isolates ... 54
3– Bioactive potential of Isolated Microorganisms ... 56
3.1– PCR Screening of bacterial isolates: PKS and NRPS genes ... 56
3.2 – Bioassay Screening ... 58
3.2.1 – Antimicrobial Assay ... 58
3.2.2 – Cytotoxic Assay ... 61
V-Discussion ... 68
1 - Biodiversity and Functional Profile of Endolithic and Soil Microbiomes from the McMurdo Dry Valleys ... 68
2 - Culture-dependent Isolation of Actinobacteria from McMurdo Dry Valleys... 70
2 - Bioactive potential from McMurdo Dry Valleys Microbial Isolates ... 72
VI –General Conclusion ... 75
VII - References ... 76
List of Figures
Figure 1 – Phylogenetic tree of the KS domain sequences collected for primer design
... 19
Figure 2 - Phylogenetic tree of the AD domain sequences collected for primer design ... 20
Figure 3 - PCR amplification of KS and AD domain from cyanobacterial gDNA. ... 21
Figure 4 - PCR amplification of KS and AD domain from gDNA of Planctomycetes and Streptomyces strains. ... 22
Figure 5 – Eletrophoresis gel of PCR products of KS and AD domains amplification using the optimized conditions. ... 23
Figure 6 – Phylogenetic tree encompassing the KS domain sequences used for primer design ... 24
Figure 7 –Structures of molecules produced by Antarctic Microorganisms, presented on Table 5. ... 29
Figure 8 – Location of sampling points in Victoria Valley (marked in red). ... 31
Figure 9 –(A) Fractionation apparatus (B) – Filtration of fraction though cotton. ... 40
Figure 10 – Rarefaction curves for alpha-diversity metrics. (A)- chao1; (B)-Phylogenetic diversity; ... 44
Figure 11 –PcoA plots using the unweighted (A) and weighted (B) UniFrac metrics. .. 45
Figure 12 - (A) Bar chart of frequency of phyla-affiliated OTUs per sampling point taxonomy summary. . ... 46
Figure 13 – Microhotographs of Paeniporosrcina sp. isolates. . ... 49
Figure 14 - Phylogenetic tree of the 16S rRNA nucleotide sequences of the obtained isolates (2F, 2H, 13F, 13G, 16D, 16 E, 17, 34, 36, 39, 47) from Firmicutes phylum and the closest matches at NCBI 16S database. ... 50
Figure 15 -Photographs of Actinobacterial strains isolated from soil sample T5. ... 51
Figure 16 - Phylogenetic tree of the 16S rRNA nucleotide sequences of the obtained isolates from Actinobacteria phylum and the closest matches at NCBI 16S database. 53 Figure 17 - Photographs of Fungi strains isolated from soil sample T5 and T6.. ... 54
Figure 18 - Phylogenetic tree of the ITS and D1/D2 rDNA nucleotide sequences of the obtained Fungi isolates and the closest matches at NCBI nucleotide collection. ... 55
Figure 19 - PCR amplification of KS using primer pair degK2F/deK2R and DKF/DK.. 56 Figure 20 – PCR amplification of AD domain using primer pair A3F/A7R . ... 57
Figure 21 – PCR amplification of KS and AD domain using primer pairs degK2F/degK2R and A3F/A7R, respectively. ... 57
Figure 22 - Photographic record of inhibition halos. ... 59
Figure 23 - Photographic record of inhibition halos. ... 60
Figure 24 - Photographic record of inhibition halos of the subfractions tested. ... 61
Figure 25 - Percentage of cell viability in the tumor cell line SH-SY5Y (neurobastoma), after 24h and 48h of exposure to organic extracts of Actinobacteria isolates. ……….61
Figure 26 - Percentage of cell viability in the tumor cell line T47-D (breast ductal carcinoma) ... 62
Figure 27 - Percentage of cell viability in the tumor cell line SH-SY5Y (neurobastoma), after 24h and 48h of exposure to VLC fractions of Penicullium citrinum strain 31. ... 63
Figure 28 -Percentage of cell viability in the tumor cell line T47-D (breast ductal carcinoma), after 24 and 48h of exposure to VLC fractions of Penicullium citrinum strain 31. ... 63 Figure 29- Percentage of cell viability in the tumor cell line SH-SY5Y (neurobastoma), after 24h and 48h of exposure to VLC sub-fractions (fraction B) of Penicullium citrinum strain 31. ... 64 Figure 30 - Percentage of cell viability in the tumor cell line T47-D (breast ductal
carcinoma), after 24h and 48h of exposure to VLC sub-fractions (fraction B) of
Penicullium citrinum strain 31 ... 64 Figure S 1 - Eletrophoresis gel of PCR products of KS domain amplification using the optimized conditions. (A) – using primer pair KSF1/KS_v2Rv and (B) using primer pair degK2F/degK2R.
List of Tables
Table 1 - Principal domains present in PKS and NRPS enzymes and their associated functions.. ... 3 Table 2- Some of the principal available bioinformatic tools and databases for supporting natural products discovery, with relevance for this study.. ... 6 Table 4 – List of primer pairs published for amplification of AD and KS domains of PKS and NRPS genes, respectively. ... 9 Table 5 - Distribution of KS and AD domain sequences selected from the 10 bacterial phyla and group in study. ... 13 Table 6 - Example of new bioactive molecules retrieved from Antarctic Microorganisms. The respective structures are depicted below on figure Figure 7... 29 Table 7 – List of primers used in this work. ... 38 Table 8 - Solvent mixtures (eluents) utilized in the fractionation of the crude extract from Penicillium citrinum strain 31 ... 40 Table 9 - Solvent mixtures used for elution on Flash-Chromatography of fraction 31B. ... 41 Table 10 – Picrust KEGG pathways. ... 48 Table 11 – Antimicrobial activity of organic extracts tested. ... 58 Table 12 - Antimicrobial activity of VLC fractions from the crud extract of Penicillium citrium strain 31. ... 59 Table 13 - Antimicrobial activity of flash-chromatography sub-fractions from 31B-1 to 31B-9. ... 60 Table 14 - Summary table of obtained isolates and results from PKS/NRPS genes and bioassays screening.. ... 65
Table S 1- Information of nucleotide sequences of KS domain collected for primer design. ... 85 Table S 2 - Information of nucleotide sequences of AD domain collected for primer design.. ... 88 Table S 3 – Primer pairs designed for KS and AD domain.. ... 94 Table S 4 – Information of bacterial strains used for primer testing, including
antiSMASH genome analysis. ... 95 Table S 5 - Detailed information of alpha-diversity measure obtained for each sample in study, including number of sequence, average of Phylogenetic diversity, chao1, observed OTUs metrics. ... 96
Table S 6 - Summary table of taxonomic frequency distributions at Genus level for
Cyanobacteria. ... 98 Table S 7 - Summary table of taxonomic frequency distributions at Phylum level. ... 99
List of presentations
Rego A, Costa MS, Ramos V, Hong SG, Vasconcelos V, Magalhães C, Leão P (2017). Extreme Polar Microorganisms: Biodiversity and Chemodiversity. IJUP 2017, Porto, Portugal, February 8-10.
Oral communication
Rego A, Costa MS, Ramos V, Vasconcelos V, Baptista M, Carvalho F, Magalhães C, Leão P (2017). Biodiversity and Bioactive Potential of Antarctic Microbiomes. Bioinformatics Open Days, Braga, Portugal, February 22-24.
Poster presentation
Rego A, Costa MS, Ramos V, Hong SG, Vasconcelos V, Magalhães C, Leão P (2017). Exploring Antarctic Microbiomes as Source of Bioactive Molecules. XIIth SCAR Biology Symposium, Leuven, Belgium, 10-14 July 2017.
Oral communication
Rego A, Costa MS, Ramos V, Vasconcelos V, Magalhães C, Leão P (2016). Biodiversity and Chemodiversity of Extreme Polar Bacteria. IJUP 16, Porto, Portugal, February 17-19.
Oral communication
Rego A, Costa MS, Ramos V, Hong SG, Vasconcelos V, Magalhães C, Leão P (2016). Extreme Polar Microorganisms: A Biotechnological Approach. VIII Conferência Portuguesa de Ciências Polares, Lisboa, Portugal, October 26-28.
List of abreviations
16S rRNA 16S ribosomal RNA
A(D) Adenylation domain
ACP Acyl carrier protein
AIA Actinomycete Isolation Agar BGC Biosynthetic gene clusters
BLAST Basic Local Alignment Search Tool
Bp Base pair(s)
CTAB bromide-polyvinylprrolidone-b-mercaptoethanol DMEM Dubelco's Modified Eagle Medium
DMSO Dimethyl sulfoxide
DNTP Deoxyribonucleotides triphosphate
e.g. Exempli gratia
END Endolithic sample
ER Enoyl reductase
ICTAR International Centre of Terrestrial Antarctic Research: ITS Internal transcribed spacer
KEGG Kyoto Encyclopedia of Genes and Genomes
KO KEGG Orthologs
KS Ketosynthase
LB Luria Broth
MB Marine Broth
MH Mueller-Hinton
MNPS Modified nutrient-poor sediment
MTT 3-(4,5-dimethylthiazol-2-yl)-2,5-diphenyltetrazolium bromide NCBI National Center for Biotechnology Information
NGS Next-Generation Sequencing
NJ Neighbor-joining
NP(s) Natural product(s)
-
NRP Nonribossomal peptide
NRPS Nonribosomal peptide synthetase
OUT Operational Taxonomic Units
PcoA Principal Coordinate Analysis PCP Peptidyl Carrier Protein
PCR Polymerase chain reaction
PD Phylogenetic diversity
PK Polyketide
PKS Polyketide synthase
PRISM Products Prediction Informatics for Secondary Metabolomes PUFA Polyunsaturated fatty acid
PVC Planctomycetes, Verrucomicrobia and Chlamydiae QIIME Quantitative insights into microbial ecology
RB Round-bottom
RKO Cell lines of colon carcinoma SH-SY5Y
TAE
Cell lines of colon carcinoma Tris-acetate-EDTA
TLC Thin layer chromatography
UV Ultraviolet
I – Introduction
Since the discovery of penicillin by Alexander Fleming in 19291, microorganisms are
recognized as a rich source of bioactive compounds, and have yielded a plethora of medically important compounds2. Fungi from the Ascomycota phylum and Bacteria,
specifically the phyla Actinobacteria, Cyanobacteria, Proteobacteria and Firmicutes are considered prolific producers of natural products3. With the current prevalence of
diseases such as cancer and with the increase in incidence of multidrug-resistant bacterial infections4, it is extremely urgent to find and develop new potential drugs.
Microbial natural products (NPs) are considered among the most promising and reliable sources of novel drug leads4.
Despite their structural and chemical diversity, a large fraction of bioactive microbial natural products belongs to the polyketides (PKs) and nonribossomal peptides (NRPs) biogenetic families, or are their hybrids. Clinically important drugs, including antibiotics (e.g the PKs erythromycin and tetracyclins, and the NRP penicillin G5), anticancer
chemotherapeutics (e.g the hybrid NRP-PK didemnin anticancer agent6)
andimmunosuppressants7 (cyclosporine, NRP8), fit in these natural products families3.
The polyketide synthase (PKS) and nonribossomal peptide synthetase (NRPS) are the families of enzyme complexes responsible for PK and NRP biosynthesis9. These are
integral parts of biosynthetic gene clusters (BGCs), i.e. a set of two or more genes physically grouped on the genome that encode the biosynthetic pathway for the production of a natural product10. Furthermore, these enzymes are organized in modules
and act by sequential thiotemplated assembly of acyl-CoA (PKS) and amino acid (NRPS) building blocks, catalysing C-C and C-N bond linkages, respectively11,12.
A PKS module is essentially composed of a ketosynthase (KS), acyltransferase (AT) and thiolation (T) domain (also referred to as Acyl Carrie Protein – ACP). Additionally, might possess a ketoreductase (KR), dehydratase (DH) and enoyl reductase (ER) domain13
(see Table 1 for description of each domain function). To date, PKSs are classified into three different classes according to the organization of their catalytic domains (non-modular, monomodular and multimodular) and in different subclasses by their mode of action (e.g. iterative, non-iterative, cis-AT or trans-AT)13,14. Type I PKSs employ a
multimodular strategy with each module constituted of specific catalytic domains for the recognition, activation and condensation of acyl-CoA, while Type II and Type III PKSs possess each catalytic site separated in different proteins3,15. Type I PKSs are
responsible for the biosynthesis of macrolides, polyethers and polyenes whereas type II PKSs are usually associated with the formation of cyclic aromatic and often polycyclic
PK compounds16. Type III PKS, also referred to as chalcone-synthase like PKSs, found
predominantly in plants, are condensing enzymes without a T domain and typically act directly on acyl-CoA substrates17.
NRPSs resemble type I PKSs in modular organization, with incorporation and processing per module of each amino acid16. Analogously to PKS, a NRPS module is minimally
composed of a condensation (C), adenylation (A) and T domain (also referred to as Peptidyl Carrier Protein – PCP)12, optionally possessing cyclization (Cy), epimerization
(E), methyltransferase (MT) and ketoreductase (KR) domains (Table 1) between others9.
The enzymatic domains KS, A and C, particularly, possess highly conserved core motifs18. Type I PKSs are frequently associated with NRPS, by co-occurring as part of
the same BGC, resulting in hybrid molecules with increased structural diversity when compared to non-hybrid PKs or NRPs.
Traditionally, the prospection for novel bioactive compounds is dependent on the cultivation of the microorganisms, followed by a bioactivity screening of their organic (or sometimes aqueous) extracts. A bioactivity-guided isolation through consecutive fractionations and bioassay testing is then typically employed until isolation of the pure compound is achieved. This approach has exposed over 11,000 PK and NRP products19.
However, with the advance in DNA sequencing through the Next-Generation Sequencing (NGS) technologies20, an explosion of genome sequences and
genome-mining studies has revealed microorganisms as an underexplored source of PK and NRP compounds, with only a small fraction of PKS-NRPS gene clusters (10%) being associated with known products19. This realization triggered a renewed interest and
Table 1 - Principal domains present in PKS and NRPS enzymes and their associated functions. Adapted from Bachmann and Ravel 200922 and Adamek et al. 201723.
Domain Essential Function
AT – Acyltransferase Selection and activation of acyl-CoA substrate through acylation
KS – Ketosynthase Catalyses C-C bond formation through Claisen condensation
KR – Ketroreductase Reduction of keto groups to hydroxyl groups
DH -Dehydratase dehydration of hydroxyl group to α-β-enoyl ER – Enoyl reductase reduction of the enoyl double bond
T – Thiolation phosphopantetheinylate acyl carrier protein shared by PKS and NRPS (also commonly referred to as ACP domain for PKSs and PCP domain for NRPSs)
TE -Thioesterase Cleavage of mature PK/NRP via macrocylization or release of the full length chain.
MT – Methyltransferase Methylation of PK, and N-methylation of NRP
C – Condensation Formation of Peptide bond
A – Adenylation Amino acid activation via intermediary adenylation
E – Epimerization Epimerizing amino acids, flipping stereo-chemistry
Re – Reductase Reduction (usually terminal) of mature PK/NRP resulting in aldehyde
Cy – Cyclization Formation of a peptide bond and subsequent amino acid cyclization
1 - A new era in Natural products - Gene and Genome mining
for discovery of (novel) molecules
1.1 - Culture-dependent approach
In spite of the usefulness of NPs, an increasing disinterest by the pharmaceutical companies for NP drug discovery programmes was verified in the last decades of the previous century in part due to the repeated isolation of known compounds 24, to the
laborious, time consuming and expensive procedures necessary to isolate them, as well as the difficulty to obtain synthetic analogues 25. On the other side, the emergence of
combinatorial chemistry promised (at the time) plenty new compounds to be used for bioactivity screening tests 26.
The advance of new sequencing technologies in the beginning of 21st century 20 dictated
a new golden era of natural products discovery 25. The exponential increase in number
of genome sequences available at databases made possible the development of different bioinformatic tools (Table 2), providing a more targeted discovery, overcoming most of the disadvantages refereed above. This development was the basis for an explosion of genome-mining studies for supporting natural products discovery 27,28.
The so-called “genome mining” approach is useful to identify gene clusters potentially responsible for the synthesis of novel compounds. By perceiving the composition and regulation of the gene clusters, a targeted isolation and characterization of new PKs and NRPs molecules can be followed, reducing prospection time and costs. Besides, the information can be used to help to activate “silent” gene clusters, as well as to optimize the conditions for heterologous expression23, as exemplified by the production of the
antibiotic pantocin B in E.coli29. This approach revealed unexpected enzymatic diversity
and extended the knowledge concerning the distribution of these enzymes through the three domains of life9. Of particular relevance, a positive correlation between genome
size and the fraction of genome allocated to secondary metabolite biosynthesis has been verified by Konstantinidis and Tiedje 200430. It has also been described that genomes
bigger than 3 Mb are likely to have at least one PK and NRP gene cluster31 while
genomes with less than 2000 ORFS, are very likely to not possess secondary metabolism-related genes30.
With the progress on sequencing technologies and the increasing number of DNA sequences deposited in public databases, not only the genome mining but also homology-based PCR screening32 started to be used, to screen the biosynthetic potential
of the strains before large-scale cultivation and/or sequencing of the entire genome. KS, A and C enzymatic domains possess highly conserved core motifs18, consensus
the primer sets developed were designed to be specific for some bacterial phyla, usually for the most prolific ones as Actinobacteria33, Cyanobacteria34,35 and also for specific
genera as Streptomyces36 (Table 3), restricting its usefulness.
Together with the development of bioinformatic tools, directed to the identification of biosynthetic gene clusters and discovery of secondary metabolites, the amount of available information (genomes and DNA sequences) led to the creation of a series of natural product biosynthesis-related databases37. Some examples of the created
platforms are antiSMASH38, a tool directed to the identification of BGCs and catalytic
domains through the analysis of entire genomes/BGCs, NaPDoS11, a web-tool directed
to the identification of catalytic domains of PKS and NRPS and NRPSpredictor239 a web
Table 2- Some of the principal available bioinformatic tools and databases for supporting natural products discovery, with relevance for this study. Adapted from Medema and Fischbach 20157 and
Adamek et al. 201723.
Tool or database Web server URL Brief description Reference
AntiSMASH https://antismash.secondarymetabolites.org/ Is a web based-tool for the automatic genomic identification of BGC. 38 NaPDoS http://napdos.ucsd.edu/ Is a web-based tool for a fast identification and analysis of secondary
metabolite genes.
11
ESNaPD http://esnapd2.rockefeller.edu/ Is a web server that provides an automated analysis tool for surveying secondary metabolite gene cluster diversity in metagenomics studies.
40
SMURF http://jcvi.org/smurf/index.php Is a web-based tool that finds secondary metabolite biosynthesis genes and pathways in fungal genomes.
41
PRISM http://magarveylab.ca/ prism/ Is a computational tool for the identification of BGC and prediction of genetically encoded NRP and type I and II PK .
24
NRPS/PKS substrate predictor
http://nrps.igs.umaryland.edu/ Is a knowledge-based tool for elucidating domain organization and substrate specificity of NRPSs and PKSs.
22
NRPSpredictor2 http:// nrps.informatik.uni-tuebingen.de/ Is a predictor of A domain specificity. 39
1.2 - Culture independent approach – Metagenomics
The ability to efficiently annotate and predict NP biosynthetic genes, as described above, opened the door to culture-independent natural products discovery. In fact, one of the main barriers and challenges faced by traditional natural products research is the ability to isolate and grow the microorganisms in laboratory. It is presumed that the majority of prokaryotes are present in oceanic and terrestrial subsurface environments42, typically
remaining inaccessible and unstudied. One gram of soil is expected to contain 107 –
1010 prokaryotic cells 42, equivalent to about 106 different genomes43. The uncultured
microorganisms present in soil and other environmental samples represent a rich reservoir of novel natural products43. The strategies currently employed to maximize
recovery into culture of the biodiversity present in a given sample include utilization of a variety of media constituents, change of growth conditions, mimicking environmental conditions, using minimal media for oligotrophic sites, consecutive dilution of the original inoculum, community culture and co-culture, among others2,43. Nevertheless, the
well-known “great plate count anomaly”44, which refers to a cultivable fraction of the microbial
richness below 1%45 is still observed today. Hence, a huge fraction of the biodiversity
(and associated chemodiversity) is lost during the culture process in laboratory. Furthermore, because microbial secondary metabolites (i.e., natural products) are sometimes produced in response to some kind of stress or environmental stimulus (e.g. environmental stress, pathogen attack), laboratory cultures of microorganisms under standard growth conditions often do not provide access to the full natural products potential of a given isolate.
Against this backdrop, metagenomics has presented as a path to reach the uncultured biodiversity46 and the correspondent biosynthetic diversity. The extraction of DNA from
environmental samples (eDNA), i.e. the metagenome, allows by one side, the identification of the bacterial species present (cultured and uncultured ones) through the PCR amplification and sequencing of the 16S rRNA gene2 and by other side can provide
information concerning the diversity of biosynthetic genes, through the PCR amplification and sequencing of the catalytic domains, usually KS34, A47 and C48 domains. This
PCR-based sequence approach can be used to identify clones of known biosynthetic domains presents in metagenomics libraries as well as to find totally novel molecules produced by known BGCs7. Notably, metagenomics associated with heterologous expression of
eDNA has enabled the discovery of different natural products from uncultured microorganisms49. Together with available bioinformatic tools, such as NaPDoS (see
Table 2) a web tool useful for the assessment of BGCs diversity though the analysis of phylogenetic relationships of sequence tags from the PKS and NRPS genes11 the
eSNAPD (see Table 2), a web-based bioinformatic platform useful for the discovery of BGCs coding for novel NPs using metagenomics data40, the identification of potential
new BGCs and consequent novel molecules can be employed.
The afore-mentioned PCR-based approach has resulted in a variety of studies, including some biogeographic studies 50–52 with identification of hotspots of bioactive potential, and more recently for prospecting of Antarctic soils53. In fact, the study of less exploited
environments – as are extreme polar environments – has resulted in the discovery of novel species and molecules (reviewed by Wilson and Brimble 200954 in “Molecules
derived from the extremes of life”). Not only is a large part of the microbial diversity in these environments still unknown, but also unique adaptations have been developed by the microbiota in these habitats, in order to survive the extreme environmental stresses, including the biosynthesis of exclusive chemical entities with unprecedented biological activities54,55.
A limiting issue on this PCR-based approach concerns the available primer sets for amplification of the catalytic domains. The majority of PCR primers for amplification of the biosynthetic domains – mainly for KS and A but also C domains – were initially designed to be specific for certain bacterial phyla, typically the most prolific, as it is the case of Actinobacteria 33 and Cyanobacteria 35. Often these primers were restricted to a
catalytic domain class [e.g. exclusive for Type I, subclass modular of PKS gene 56)], in
some part, due to the higher number of sequences available for these bacterial groups. However, currently, with the extremely large amount of genome sequences present in public databases, we have access to sequences from a broad variety of bacterial phyla
57, including the more abundant and chemically-prolific. This creates an opportunity for
the design of universal primers for PKSs and NRPSs, that would allow to obtain a more accurate representation of the real biosynthetic diversity present in environmental samples.
In accordance to this, we aim to combine state-of-the-art culture-dependent and independent approaches to achieve the overall goal of exploring the diversity of bioactive small molecules from polar microbiomes. Specifically, the following are objectives of this dissertation and correspond to Chapters 1, 2 and 3, respectively:
(1) To design a primer pair capable of amplify the KS and A domain of PKS and NRPS genes from a wide range of chemically-prolific bacterial phyla.
(2) To analyse the biodiversity and the biosynthetic richness of polar microbiomes, through amplification and sequencing of the 16S rRNA gene and PICRUSt predictions.
(3) To isolate and grow microorganisms from Antarctic environmental samples and analyse their bioactive potential through in vitro assays.
Table 3 – List of primer pairs published for amplification of AD and KS domains of PKS and NRPS genes, respectively.
Domain and respective gene Primer name Sequence (5’-3’) Reference Notes
AD – NRPS (700-800bp) A3F (GCSTACSYSATSTACACSTCSGG) 33 Specific for Actinobacteria
AD - NRPS (700-800bp) A7R (SASGTCVCCSGTSCGGTA) 33 Specific for Actinobacteria
AD – NRPS (480bp) NRPS_F (CGCGCGCATGTACTGGACNGGNGAYYT) 53 Designed for Metagenomic studies
AD - NRPS (480bp) NRPS_R (GGTCCGCGGGACGTARTCNARRTC) 53 Designed for Metagenomic studies
AD – NRPS (300bp) A2gamF (AAGGCNGGCGSBGCSTAYSTGCC) 58 Conserved motif A2
AD - NRPS (300bp) A3gamR (TTGGGBIKBCCGGTSGINCCSGAGGTG) 58 Conserved motif-A3
AD – NRPS (1000 bp) MTF2 [GCNGG(C/T)GG(C/T)GCNTA(C/T)GTNCC] 35 Specific for Cyanobacteria
AD – NRPS (1000 bp) MTR [GCNGG(C/T)GG(C/T)GCNTA(C/T)GTNCC] 35 Specific for Cyanobacteria
C – NRPS (700 bp) CnDmF [ATGCATCACATT(AG)TN(TC)(TC)NGA] 48 For metagenomics studies
C – NRPS (700 bp) DCCR [GTGTTNAC(AG)AA(AG)AANCC(AGT)AT] 48 For metagenomics studies
KS - Type I PKS (700 bp) degK2F [GCIATGGAYCCICARCARMGIVT] 59 Specific for Type I Modular PKS
KS - Type I PKS (700 bp) degK2R [GTICCIGTICCRTGISCYTCIAC] 59 Specific for Type I Modular PKS
KS - Type I PKS (1200-1400bp) K1F [TSAAGTCSAACATCGG BCA] 33 Specific for Actinobacteria
KS - Type I PKS (1200-1400bp) M6R [CGCAGGTTSCSGTACCAGTA] 33 Specific for Actinobacteria
KS – Type I PKS (700 bp) DKF [GTGCCGGTNCC(AG)TG(GATC)G(TC)(TC)TC] 34 Specific for Type I PKS
KS – Type I PKS (700 bp) DKR [GCGATGGA(TC)CCNCA(AG)CA(AG)(CA)G] 34 Specific for Type I PKS
KS – Type I PKS (700 bp) KSF (CGC TCC ATG GAY CCS CAR CA) 60 Specific for Type I PKS
KS – Type I PKS (700 bp) KSR (GTC CCG GTG CCR TGS SHY TCSA) 60 Specific for Type I PKS
KSα - Type II PKS (600bp) KSα – F (TSG CST GCT TCG AYG CSA TC) 36 Specific for Streptomyces and Type II PKS
KSα - Type II PKS (600bp) KSα – R (TGGAANCCGCCG AAB CCGCT) 36 Specific for Streptomyces and Type II PKS
KSα - Type II PKS (554 bp) 540F (GGITGCACSTCIGGIMTSGAC) 61 Specific for Actinobacteria
KSα - Type II PKS (554 bp) 1100R (5’CCGATSGCICCSAGIGAGTG3’) 61 Specific for Actinobacteria
KSβ – Type II PKS (1500 bp) dp:KSα (5’TTCGGCGGXTTCCAGTCXGCCATG3’) 62 Specific for Iterative Type II KSβ PKS
KSβ – Type II PKS (1500 bp) dp:ACP (5’TCCAGCAGCGCCAXCGACTCGTAXCC3’) 62 Specific for Iterative Type II KSβ PKS
KSβ – Type II (350bp) PKS_F (5’GGCAACGCCTACCACATGCANGGNYT3’) 53 Designed for Metagenomic studies
Chapter 1 - Design of primer
pairs targeting the biosynthetic
domains of PKS and NRPS
genes in Bacteria
I – Background
PKSs and NRPSs are mega enzymes responsible for the biosynthesis a large fraction of NPs of pharmacological importance 63. With the recent advances in genome
sequencing, it is now recognized that biosynthetic potential is not restricted to the most prolific and well-studied phyla in this regard, as are Actinobacteria and Cyanobacteria 3,
but widespread throughout the tree of life 9. Recent bioinformatic studies have suggested
that under-explored bacterial groups, previously considered poor in NPs, actually possess the genetic potential to produce secondary metabolites of the NRPS and PKS types 3. Examples are members of the bacterial phyla Verrucomicrobia, Chlamydiae, and
Elusimicrobia.
PCR-based strategies using primers targeting biosynthetic genes, such as the KS 34 of
PKSs, as well as the AD 47 or C 48 domain of NRPSs have been used to assess the
bioactive potential of bacterial isolates. More recently, this approach has also been employed to survey the biosynthetic potential in microbiomes/metagenomes, directly from the eDNA 50–52. However, this strategy is currently limited by the use of primers originally designed to be specific to some bacterial phyla, mainly Actinobacteria 33,36. As
such, using current molecular tools for PCR-based screening (which is predicted to become even more frequent due to NGS sequencing technologies), some biosynthetic potential remains unreachable, in particular from those phyla that have traditionally been neglected in terms of secondary metabolite biosynthesis.
Against this backdrop, we envision that better-performing, universal primer pairs for PKSs and NRPSs can provide useful for large surveys of biosynthetic potential from eDNA, which we expect to become ever more common.
II – Goals
Here, we aimed to design “universal” primer pairs, amenable to NGS-sequencing studies, able to amplify the biosynthetic domains of genes responsible for the production of pharmaceutically-relevant NPs (in this case PKs and NRPs) from a wide range of bacterial phyla (Cyanobacteria, Firmicutes, Chloroflexi, Actinobacteria, Bacteroidetes,
Planctomycetes, Verrucomicrobia and Chlamydiae group (PVC), Deltaproteobacteria, Alphaproteobacteria, Betaproteobacteria and Gammaproteobacteria).
III – Materials and Methods
1 - Design of oligonucleotide primers
1.1 -Sequence retrieval for KS Domain of Type I PKS genes and AD domain for NRPS genes
KS Domain sequences were retrieved for the seven described groups of Type I PKS (Enediyne, PUFA, Trans-AT Hybrid, Iterative, Modular and KS1 11). Ten bacterial phyla
and one bacterial group (PVC), i.e. the most abundant according to the most recent tree of life 57 were selected with the intent of covering most of the potential biodiversity. The
selected Phyla are described in Table 4.
Aminoacidic KS Domain sequences from already characterized molecules for each PKS class were retrieved from the NaPDoS database 11. The aa sequences were submitted
to the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool (BLAST) for a tblast(n) search, and the correspondent nucleotide sequences were retrieved. The nucleotide sequences served then as an initial query for a blast(n) search at the nucleotide collection database at NCBI for each of the bacterial phyla. The genomes/biosynthetic gene clusters with homologues were submitted to an analysis antiSMASH 3.0 38. The Protein sequences of KS domains obtained through
antiSMASH analysis were recovered and through a tblast(n) search, the correspondent nucleotide sequence was obtained. The whole nucleotide sequences were submitted to an analysis with NaPDoS, to get an insight into its class, and grouped in accordance to the result obtained. The resultant nucleotide sequences served also as query for the remaining ketosynthases classes.
For A domain of NRPS genes, the NORINE 37,64 and NaPDoS databases were used to
Table 4 - Distribution of KS and AD domain sequences selected from the 10 bacterial phyla and group in study.
Bacterial Phyla Number of sequences retrieved
KS domain NRPS domain Actinobacteria 8 11 Cyanobacteria 8 11 Firmicutes 10 14 Chloroflexi 5 4 Bacteroidetes 5 10 Elusimicrobia 1 2 PVC Group 8 3 Deltaproteobacteria 12 10 Alphaproetobacteria 7 9 Betaproteobacteria 6 7 Gammaprotobacteria 6 9
1.2 - Multiple-sequence alignment and Phylogenetic Analysis
A total of 76 and 90 sequences of KS (Supplementary Table S 1) and A (Supplementary Table S 2) domains collected in NCBI, repectively, were used. The nucleotide sequences were aligned through ClustalW (default parameters) in MEGA7 65. The alignments were
manually reviewed to remove short sequences and the extremities trimmed. For primer design, conserved sites were fixed at 80% in MEGA 7 and the alignment submitted to Geneious 8.1.966, with a consensus threshold of 75% to favour the determination of the
most conserved sites. The selected conserved sites were inspected and the degenerate primer pairs were manually designed. The aligned sequences were also subjected to a phylogenetic analysis with MEGA7. Phylogenies were reconstructed using Neighbor-joining (NJ), with 1000 bootstrap replicates.
1.3 - In silico analysis of primers reliability
The designed primer sequences (Supplementary Table S 3) were submitted to a variety of online servers to calculate their properties as well as to test by virtual PCRs their predicted performance. For oligonucleotide primer properties, the primer sequences were submitted to: OligoCalc version 3.27 67, OligoAnalyzer 3.1 Tool (Integrated DNA
Technologies-http://eu.idtdna.com/calc/analyzer) and Multiple primer analyser (ThermoFisher Scientific- https://www.thermofisher.com/pt/en/home/brands/thermo-
PCR amplifications were performed with In silico simulation of molecular biology experiments - http://insilico.ehu.es/ 68,69, iPCR (product extractor) -
http://www.ch.embnet.org/software/iPCR_form.html and Sequence Manipulation Suite: PCR Products - http://www.bioinformatics.org/sms2/pcr_products.html 70.
1.4- Optimization of the PCR Amplification protocol
1.4.1 – Genomic DNA extraction and quantification
Genomic DNA from Cyanobacteria, Actinobacteria, Planctomycetes, Beta-proteobacteria, Gamma-proteobacteria and Alpha-proteobacteria strains were used to
test the efficiency of the designed primers. Only bacteria with already sequenced genomes were selected, except for Cyanobacteria, which were selected based on the presence of PKS and NRPS genes as detailed in a previous study 71. The genomes of
the selected bacteria were submitted to antiSMAH analysis to survey the presence of PKS and NRPS genes. A detailed list of the bacterial strains used is depicted on (Supplementary Table S 4).
For DNA extraction, bacteria (except Cyanobacteria) were grown in 5 mL of liquid culture media in 50 mL falcons, at constant agitation (100 rpm) at 27 ºC. ML14 72, Marine broth
(MB) and Luria Broth (LB) media were used for Planctomycetes, Proteobacteria and
Chromobacteria violaceum, respectively. Planctomycetes strains were gently ceded by
Olga Lage (CIIMAR/FCUP, Porto, Portugal). Genomic DNA from Streptomyces was a kind gift from Marta Vaz Mendes (i3S, Porto, Portugal). The gDNA was then extracted using the E.Z.N.A.® Bacterial DNA Kit (OMEGA bio-tek). The manufacturer’s instructions were followed and DNA eluted with elution buffer in a final volume of 100 µL. The integrity of the gDNA was visualized through an electrophoresis gel, a 0.8% agarose gel prepared in Tris-acetate-EDTA (TAE) buffer 1x. and stained with 1 µL of SYBR® Safe DNA Gel Stain (ThermoFisher Scientific). One microliter of DNA (with loading dye) was loaded onto each lane and the gel submitted to an electrophoresis at 80 V for 30 minutes. For cyanobacterial gDNA extraction, fresh biomass from Z8 73 liquid medium (about 2
mL) was harvested for each selected cyanobacterium. The gDNA was then extracted and purified using the Purelink Genomic DNA Mini Kit (Invitrogen), applying the Gram Negative Bacterial Cell Protocol.
The DNA concentration was measured in a Qubit® 3.0 Fluorometer (Life technologies) by using a Qubit® dsDNA HS Assay Kit (Life Technologies). The manufacturer’s instructions were followed and 1 µL of each gDNA was used. The gDNAs were
normalized to a final concentration of 25 ng µL-1, unless the initial concentration was
lower than this value.
1.4.2. – Optimization of PCR reactions: reagent and thermal conditions
To determine the best PCR conditions, including improved specificity and strong amplification, different conditions and reagents were tested.
Three different Taq DNA Polymerases were employed: Gotaq (Promega), DreamTaq (Thermo Fisher Scientific) and TaKaRa hot start version (Clontech). As per the manufacturer’s instructions, the basic PCR protocol used with GoTaq Polymerase consisted of: 1× Green GoTaq® Flexi Buffer (Promega), 2.5 mM MgCl2 (Promega), 500 μM of DNTP Mix (Promega), 1 μM of each of the primers (STABVIDA), 0.5 U of GoTaq® DNA Polymerase (Promega) and of 2 μL template DNA in 20 μL of reaction. The standard PCR conditions executed were: initial denaturation step at 94 ºC for 10 min, followed by 40 cycles of a denaturation step at 94 ºC during 30 s, annealing (determined temperature) for 30 s and extension at 72 ºC for 1 min and a final extension step at 72 ºC for 7 min. The basic protocol for DreamTaq consisted of: 1x Dream Taq PCR Mastermix, 1 μM of each primer and 2 μL of template DNA in a final volume of 25 μL. The standard PCR conditions executed: initial denaturation step at 95 ºC for 2 min, followed by 30 cycles of denaturation step at 95 ºC for 30 s, annealing (ºC determined) for 30 s, extension for 1 min, and a final extension of 10 min at 72 ºC. The basic protocol for TaKara consisted in: 1× TaKaRA PCR Buffer (TAKARA BIO INC), 1.5 mM MgCl2 (TAKARA BIO INC), 250 μM DNTPS (TAKARA BIO INC), 1 μM of each of the primers (STABVIDA), and 0.5 U TaKaRa Taq™ Hot Start Version (TAKARA BIO INC) and 2 μL of template DNA in a final volume of 20 μL. The PCR conditions executed were: initial denaturation step at 98 ºC for 2 min, followed by 40 cycles of a denaturation step at 94 ºC for 30 s, annealing at the determined temperature for 30 s and extension at 72 ºC for 1 min, followed by a final extension step at 72 ºC for 5 min.
Variations to the reaction mixtures mentioned above were carried out and included: gradient of primer concentration, MgCl2 and DNTPs concentration conjugated with presence/absence of UltraPureTM BSA (Life teschnologies), and gradient of
concentration (1-3%) of DMSO (Thermo Scientific). Concerning the thermal cycling protocol, initially, a gradient of annealing temperatures was performed for each primer pair to determine the best annealing temperature. Extension time and number of cycles were also object of optimization. A Touchdown PCR protocol 74 was also employed for
the Taq polymerase TaKaRA hot start version. The protocol consisted of: initial denaturation step at 95 ºC for 3 min, followed by 10 cycles of a denaturation step at 95 ºC for 30 s, annealing at 75 ºC for 45 s and extension at 72 ºC for 25 s, followed by 25
cycles of a denaturation step at 95 ºC for 30 s, annealing at 60 ºC for 45 s, extension at 72ºC for 25 s and a final extension step at 72ºC for 5 min.
1.4.3 – Comparison of amplification results for protocols using designed primers or literature primers
The primer pairs degK2F/degK2R 59 and A3F/A7R 33 were used for PCR amplification of
KS and AD domain and were the benchmark against which the designed primers were compared. For the literature primer pairs, the thermal cycling was based on the literature protocols and were performed at Veriti® 96-Well Thermal Cycler (ThermoFisher Scientific). The PCR reactions were prepared in a volume of 20 μL containing 1× TaKaRA PCR Buffer (TAKARA BIO INC), 1.5 mM MgCl2 (TAKARA BIO INC), 250 μM DNTPS (TAKARA BIO INC), 0.625 μL of primer (100 μM), 0.25 mg/mL of UltraPureTM BSA (Life technologies), 0.5 U TaKaRa Taq™ Hot Start Version (TAKARA BIO INC) and 2 μL of template DNA. The PCR conditions executed were: initial denaturation step at 95 ºC for 4 min, followed by 40 cycles of a denaturation step at 94 ºC for 30 s, annealing at 67,5 ºC for 30 s and extension at 72 ºC for 60 s, followed by a final extension step at 72 ºC for 5 min, for amplification of AD domain using primer pair A3F/A7R. For primer pair degK2F.i/degK2R.i the conditions executed were: initial denaturation step at 95 ºC for 4 min, followed by 40 cycles of a denaturation step at 94 ºC for 40 s, annealing at 56,3 ºC for 40 s and extension at 72 ºC for 75 s, followed by a final extension step at 72 ºC for 5 min. PCR products (10 μL loaded onto each well) were separated by electrophoresis on a 1.5% (w/v) agarose gel during 40 minutes at 120 V, together with 5 μL of GRS ladder 1 kb (Grisp). Gel was stained with 1 μl SYBR® Safe DNA Gel Stain (ThermoFisher Scientific), visualized under UV-light at Gel Doc XR+ System (BIO-RAD) and analysed with Image Lab™ software (BIO-RAD).
1.5 Sequencing of PCR products – Test and Phylogenetic analysis
After the PCR protocol optimization, an initial test with the following gDNAs was carried out to evaluate primer functionality. Nodosilinea nodulosa LEGE 06152, Cobetia marina CECT 4278, Halomonas aquamarina CECT 5000 and Streptomyces natalensis ATCC 27448 gDNAs were used for KS domain amplification. The AD domain was amplified from gDNA of Nodularia sp. LEGE 06071 and Streptomyces natalensis ATCC 27448. The thermal cycling was performed at Veriti® 96-Well Thermal Cycler (ThermoFisher Scientific). For amplification of KS domain, the PCR reaction was prepared in a volume of 20 μL containing 1× TaKaRA PCR Buffer (TAKARA BIO INC), 1.5 mM MgCl2 (TAKARA BIO INC), 250 μM DNTPS (TAKARA BIO INC), 1 μM of each of the primers (KSF1/KS_v2_Rv - STABVIDA), 3% DMSO, 0.5 U TaKaRa Taq™ Hot Start Version
(TAKARA BIO INC) and 2 μL of template DNA. The PCR conditions executed were: initial denaturation step at 98 ºC for 2 min, followed by 40 cycles of a denaturation step at 94 ºC for 30 s, annealing at 55 ºC for 30 s and extension at 72 ºC for 22s, followed by a final extension step at 72 ºC for 5 min.
For amplification of AD domain, the reaction was prepared in a volume of 20 μL containing 1× TaKaRA PCR Buffer (TAKARA BIO INC), 1.5 mM MgCl2 (TAKARA BIO INC), 250 μM DNTPS (TAKARA BIO INC), 1 μM of each of the primers, 0.5 U TaKaRa Taq™ Hot Start Version (TAKARA BIO INC) and 2 μL of template DNA. The PCR conditions executed were: initial denaturation step at 98 ºC for 2 min followed by 40 cycles of a denaturation step at 94 ºC for 30 s, annealing at 54 ºC for 30 s and extension at 72 ºC for 22 s, followed by a final extension step at 72 ºC for 5 min.
For validation of the PCR reaction, 5 μL of PCR products were separated by electrophoresis on a 1.5% (w:v) agarose gel during 40 min at 120 V. GRS ladder 100 bp (GriSp) was used (5 μL loaded). Gel was stained with 1 μl SYBR® Safe DNA Gel Stain (ThermoFisher Scientific), visualized under UV-light at Gel Doc XR+ System and analysed with Image Lab™ software.
After validation of the reaction, PCR products (15 µL of PCR product loaded onto each well) were separated by electrophoresis on a 1% (w/v) agarose gel during 60 min at 150 V, stained with 1 μL SYBR® Safe DNA Gel Stain (ThermoFisher Scientific). The bands were visualized under UV-light with Gel Doc XR+ System and excised using sterile scalpels. The bands were purified using the kit NZYGelpure (nzytech) and sequenced by Sanger sequencing at STABVIDA (Portugal). Briefly, the following components were used: igDye ® Terminator v3.1 Cycle Sequencing Kit [Applied Biosystems]; BigDye® Terminator v1.1, v3.1 5 Sequencing Buffer [Applied Biosystems]; primer 10 μM; nuclease-free water (Ambion); purified PCR product. The sequencing products were purified with illustra™ Sephadex™ G-50 Fine DNA Grade and submitted to an automated capillary electrophoresis on ABI 3730xl Genetic Analyzer sequencer (Applied Biosystems). The Visual quality control of the electropherograms was performed in Sequence Scanner v1.0 (Applied Biosystems). Raw forward and reverse sequences (ab1 files) were submitted to Geneious 8.1.9 66 for de novo assembling. The resulting
consensus sequences (average length 300 and 240 bp for KS and AD domain, respectively) were submitted to NCBI for a blast(n) search against the nucleotide collection database.
1.5.1 – Phylogenetic and NaPDoS analysis
The obtained sequences were aligned with the sequences used for primer design and submitted to a phylogenetic analysis, to survey the diversity covered. KS domain
sequences were also classified using the web tool NaPDoS and a phylogenetic tree (using the NaPDoS database as reference) was constructed.
IV – Results
Alignments composed of 63 and 84 sequences with 1462 and 1506 bp for KS and AD domain, respectively, were obtained as a basis for primer design. For each alignment, a phylogenetic analysis was performed to inspect the diversity covered by the selected sequences. For the KS domain, according to the phylogenetic tree (Figure 1) it is possible to observe that the sequences are diverse and encompass all the documented classes of KSs, with a clear clustering pattern linked to function. Likewise, the phylogeny of the AD domain (Figure 2) included the known diversity of ADs and the clustering pattern was congruent with the type of domain.
A series of conserved zones were selected for primer design. In total four forward and two reverse primers were designed for the KS domain, in different regions of the gene (Table S 3). For the AD domain, five forward and four reverse primers were designed. Initially, for KS domain, primer pairs KSF1/KSR1 and KSF2/KSF1 were designed, which, according to the in silico analysis, seemed very robust. For the AD domain, initially primers ADFw1/ADRv1, ADFw2 and AFw2.1/ADRv2 and ADFw3/ADRv3 were designed. However, the PCR amplification originated quite a few non-specific bands in both cases, or no bands at all, even after the attempts to optimize the conditions. A second iteration of primer design was performed, either by designing in new regions or by decreasing the degeneracy of the primers, that yielded the primers designated as v2 (Table S 3).
Several combinations of primer pairs were tested, and the pairs KSF1/KS_v2_Rv and NRPS_v2_Fw/ADRv3 proved to be the more reliable. When comparing PCR amplifications, using these designed primers, albeit with a non-optimized protocol, to the currently used literature primer pairs, is possible to observe a band with the expected bp in almost all the strains tested (Figure 3 and Figure 4), but the literature primers fared better. Amplification of AD domain, gave a good indication that the designed primers are able to recover a broader range of diversity, as product with the expected bp was obtained in cyanobacterial gDNA with designed primers and not with primer pair A3F/A7R tested (Figure 3). Also, amplification with gDNA of Proteabacteria yielded similar results (Supplementary Figure S 1).
PUFA
FAS Enediyne
Modular
nosB
Herpetosiphon aurantiacus DSM 785 Hybrid KS Domain ituA
jamE elaJ baeM
Herpetosiphon aurantiacus DSM 785 KS1 KS Domain Nitrosomonas europaea ATCC 19718 Iterative KS domain Spirosoma radiotolerans strain DG5A KS Domain Modular Rhodopirellula baltica SH 1 Modular KS Domain
Herpetosiphon aurantiacus DSM 785 Modular KS Domain epoA
pltC
Methylobacterium sp. 4-46 Modular KS Domain hliP
Opitutus terrae PB90-1 Modular KS Domain Opitutus terrae PB90-1 Iterative KS Domain
aviM vinP2
Achromobacter xylosoxidans strain FDAARGOS 147 Modular KS Domain Lysobacter sp ATCC 53042 Trans-AT KS Domain
lnmI
Burkholderia gladioli BSR3 Trans-AT KS Domain Hymenobacter sp. PAMC 26554 Hybrid KS Domain Paenibacillus mucilaginosus K02 Hybrid KS Domain Opitutus terrae PB90-1 Hybrid KS Domain
blmVIII var4
Alcanivorax pacificus W11-5 Hybrid KS Domain Paracoccus denitrificans PD1222 Hybrid KS Domain
mtaD
Singulisphaera acidiphila DSM 18658 KS1 KS Domain Gloeobacter violaceus PCC 7421 DNA Trans-AT KS Domain
tmnAI stiA sorA gulB
dszA
Xanthobacter autotrophicus Py2 Iterative KS Domain Methylobacterium radiotolerans JCM 2831 Enediyne KS Domain Burkholderia cenocepacia strain ST32 Iterative KS Domain Tistrella mobilis KA081020-065 Iterative KS Domain
Singulisphaera acidiphila DSM 18658 Iterative KS Domain Paenibacillus mucilaginosus K02 Iterative KS Domain HSAF
Xanthobacter autotrophicus Py2 PUFA KS Domain Rubrivivax gelatinosus IL144 PUFA KS Domain
Corallococcus coralloides DSM 2259 PUFA KS Domain pfaA Aureispira marina
HglE
PfaA Shewanella violacea DSS12
Elusimicrobium minutum Pei191 PUFA KS Domain Roseiflexus castenholzii DSM 13941 PUFA KS Domain Planctomyces sp. SH-PL62 PUFA KS Domain
Pandoraea oxalativorans strain DSM 23570 Enediyne KS Domain Methylococcus capsulatus strain Bath Enedyine KS Domain
calE8
Haliangium ochraceum DSM 14365 Enediyne KS Domain
Microcystis aeruginosa NIES-843 Enediyne KS Domain Enediyne KS Domain Herpetosiphon aurantiacus DSM 785
jamG
Bacillus velezensis strain CC09 Modular KS Domain FAS Streptomyces sp. 2114.2
FAS Escherichia coli strain FORC 031
99 76 99 99 98 72 97 82 83 99 82 99 74 29 26 43 99 79 97 98 38 48 46 67 47 30 23 40 37 99 80 97 95 34 49 53 96 44 59 52 51 53 32 53 33 26 99 29 23 31 36 20 15 15 0 10 0 0 26 4 7 0.1 Iterative KS1 Hybrid Trans-AT Iterative Modular Modular Hybrid KS1 Trans-AT Trans-AT
Figure 1 – Phylogenetic tree of the KS domain sequences collected for primer design and the respective class to which they belong. The tree was computed in MEGA7 110, reconstructed using the Neighbor-Joining 182 and bootstrap method
(1000 replications) and englobed 67 nucleotide sequences with 1462 bp. Fatty acid synthase (FAS) sequences from E.coli and Streptomyces sp. were included as outgroup.
Figure 2 - Phylogenetic tree of the AD domain sequences collected for primer design. The tree was computed in MEGA7 [63], reconstructed using the Neighbor-Joining [135] and bootstrap method (1000 replications) and englobed 82 nucleotide sequences with 1506 bp. Sequences from hybrid PKS-NRPS genes, mtaD, nosB and blmVIII were included as outgroup.
mycB bamB ituC bacB licB bacC bacA tycB licA fenD
Spirosoma radiotolerans strain DG5A(SD10 02305) Elusimicrobium minutum Pei191 AD Domain (Emin 0995)
Pantoea agglomerans(AAO39110.1) vibF
Elusimicrobium minutum Pei191 AD Domain(Emin 1012) dhbF
Filimonas lacunae NBRC 104114 (FLA 1939)
Chryseobacterium gallinarum strain DSM 27622 (OK18 15880) Spirosoma radiotolerans strain DG5A(SD10 09935)
Mucilaginibacter gotjawali (MgSA37 03143) Filimonas lacunae NBRC 104114 (FLA 1304)
Herpetosiphon aurantiacus DSM 785 AD domain (Haur 1574) ndaA crs2 crs1 ablD Winogradskyella sp. PG-2 (WPG 0383) nosA mcnB adpA ociB aptA1 ndaB aptA2 mcnA adpB Flammeovirgaceae bacterium 311 (D770 00005)
Herpetosiphon aurantiacus DSM 785 AD Domain(Haur 1805) Herpetosiphon aurantiacus DSM 785 AD domain(Haur 2091)
Herpetosiphon aurantiacus DSM 785 (Haur 3129) Azotobacter vinelandii CA6 (AVIN RS11710)
Collimonas sp. MPS11E8 (CCT ORF03016) cbsF
Rhizobium leguminosarum Vaf10 AD Domain (BA011 36690) Rhizobium leguminosarum strain Vaf10 (BA011 37190)
massB
Erwinia amylovora ATCC BAA-2158 (EAIL5 3813)
Xanthomonas oryzae pv. oryzicola strain RS105(ACU12 09555) ofaA arfA vlm1 acmB mscH antB mscF
Methylocella silvestris BL2(Msil 0855)
Bradyrhizobium oligotrophicum S58 AD domain (S58 21570) Variovorax paradoxus EPS (Varpa 4519)
orbI
Burkholderia cepacia GG4 AD domain(GI:402247746) Methylobacterium populi BJ001 (Mpop 5163)
Methylobacterium extorquens CM4 (Mchl 5090) depD
Delftia sp. Cs1-4(DelCs14 2100)
Azospirillum thiophilum strain BV-S (AL072 22320) Tistrella mobilis KA081020-065 (TMO c0602) cndF
Streptomyces coelicolor A3 (SCO6431) ncyE qui6 melC cmnA nocB nocA
Hymenobacter sp. PAMC 26554(A0257 16655) chiD
tubC
Myxococcus fulvus 124B02(MFUL124B02 24325) Ralstonia solanacearum (RSp1422)
mtaD nosB blmVIII