Exploring Polar Microbiomes as Source of Bioactive Molecules

(1)

Exploring Polar

Microbiomes as Source

of Bioactive Molecules

Adriana Isabel Correia Rego

Mestrado em Biologia Celular e Molecular

Departamento de Biologia da Faculdade de Ciências da Universidade do Porto

Dissertação de Mestrado 2016/2017

Orientador: Pedro Leão, Investigador FCT, Centro Interdisciplinar de Investigação Marinha e Ambiental (CIIMAR)

Co-orientador: Catarina Magalhães, Investigador FCT, Centro Interdisciplinar de Investigação Marinha e Ambiental (CIIMAR) e Professora Auxiliar Convidada FCUP

(2)

Todas as correções determinadas pelo júri, e só essas, foram efetuadas. O Presidente do Júri,

(3)

Agradecimentos

Antes de mais, quero agradecer aos meus pais, sem o apoio dos quais nada disto se teria tornado possível.

Agradecer aos meus orientadores, Pedro Leão e Catarina Magalhães, pela confiança depositada, por todos os conhecimentos partilhados e tempo despendido e pela oportunidade de fazer um trabalho de investigação desafiante e concretizador.

Quero também agradecer à Teresa Martins, António Sousa e Inês Ribeiro, companheiros desta jornada, por toda a partilha de bons momentos e entreajuda. Em especial à Teresa pela amizade e ajuda incansável na química, ao António pela ajuda e paciência na análise bioinformática e à Inês, pela companhia e ajuda na realização dos ensaios.

Agradecer à equipa do laboratório Ecobiotec, em especial à Fátima Carvalho e à Mafalda Baptista pela ajuda nos isolamentos bacterianos. Ao grupo de Bioinformática, à Maria Paola e ao António, por todos os ensinamentos e momentos de boa disposição. A toda a equipa do BBE, particularmente ao João, à Raquel e ao Vítor pelo fornecimento das estirpes da coleção de culturas e DNAs. Ao Tiago por toda a ajuda fornecida na realização dos ensaios de citotoxicidade. Também ao Jorge, pela companhia nos almoços e por ter sempre uma palavra de incentivo.

Por fim, agradecer ao Alfredo, pelo apoio incondicional.

Agradeço também ao NORTE2020, Fundo Europeu de Desenvolvimento Regional (FEDER), programas estruturados R&D&I MarInfo - NORTE-01-0145-FEDER-000031 e R&D&I INNOVMAR - NORTE-01-0145-FEDER-000035, NOVELMAR e ao Programa Polar Português (PROPOLAR) pelo financiamento.

(4)

Resumo

Com o aumento da incidência de estirpes bacterianas multirresistentes aos antibióticos e doenças como o cancro, existe uma necessidade imperativa de encontrar novos potenciais fármacos. Os produtos naturais de origem microbiana estão na base de uma séride de fármacos de grande importância, utilizados atualmente para combater uma grande variedade de enfermidades. As estratégias adotadas mais recentemente para a pesquisa de novas moléculas assentam na análise genética de clusters de genes biossintéticos ou genomas, assim como na pesquisa de genes biossintéticos (em particular do tipo PKS e NRPS), em combinação com uma pesquisa guiada pela bioatividade e estrutura. O estudo de microorganismos que habitam ambientes extremos é igualmente uma estratégia promissora, dado que é expectável que grande parte dos microorganismos sejam ainda desconhecidos e possuam estratégias adaptativas únicas ao seu habitat, incluindo a produção de novas moléculas.

Neste trabalho, conjugamos ambas as estratégias, pesquisa de genes biosintéticos e testes de bioatividade para avaliar o potencial bioativo de um microbioma polar, os Vales Secos da Antártida. Dois objetivos principais foram definidos:

(1) O desenho de primers capazes de amplificar os domínios biosintéticos, KS e A, dos genes PKS e NRPS, respetivamente.

(2) A análise da diversidade microbiana de dados de pirosequenciação, assim como o isolamento de microorganismos de amostra ambientais da Antártida, e uma triagem do potencial bioativo dos isolados através da realização de bioensaios.

Dois novos pares de primers foram desenvolvidos neste trabalho, capazes de eficientemente amplificar os domínios biossintéticos (KS e A) de estirpes bacterianas pertencentes a pelo menos, os filos de Actinobactéria, Cianobactéria, Proteobactéria e Planctomicetes, úteis para triagem de isolados bacterianos em grande escala e estudos de bioprospecção de metagenómica. Caso validações futuras comprovem a sua eficiência, tais primers poderão vir a tornar-se um novo padrão para bioprospecção metagenómica com base em PCR.

As amostras ambientais da Antártida revelaram uma grande diversidade de filos quimicamente prolíficos, em particular de actinobactérias e cianobactérias. Foram obtidos isolados bacterianos dos filos Actinobacteria, Firmicutes e Proteobacteria e ainda espécies de fungos. Verificou-se que muitos dos isolados demonstravam

(5)

bioatividade em diferentes ensaios, em particular um extrato fraccionado com propriedades antimicrobianas produzido por um fungo do género Penicillium. Duas potenciais novas espécies de dois géneros diferentes são apresentadas e que têm também capacidade genética de produção de metabolitos secundários.

Palavras-chave

Produtos naturais, metabolismo secundário, primers, diversidade microbiana, PKS, NRPS, bioatividade

(6)

Abstract

With the increase in incidence of antibiotic multi-resistant bacterial species and diseases as cancer, there is an urgent necessity to find new potential drugs. Microbial natural products have yielded a variety of currently used pharmaceutically important compounds. Presently the strategies adopted to find new molecules rely on the genetic analysis of the biosynthetic gene clusters/genomes as well as gene mining (for PKS and NRPS genes), combined with bioactivity- and structure-guided discovery. Furthermore, the study of microorganisms inhabiting extreme environments is also pointed as an auspicious strategy, as it is expected that a large fraction of their microbiota is still unknown and that these organisms possess unique adaptations to their habitats, including the production of novel molecules.

Here, we combine both biosynthetic gene mining and bioactivity-guided strategies to survey the bioactive potential of a polar microbiome, the McMurdo Dry Valleys, in Antarctica.

To achieve this, two main objectives were pursued:

(1) the design of a primer pair to amplify the KS and A domain of PKS and NRPS genes from a wide range of chemically-prolific bacterial phyla and,

(2) biodiversity analysis of pyrosequencing data, isolation and growth of microorganisms from Antarctic environmental samples and screening of the bioactive potential of the isolates through in vitro assays.

Improved primer pairs, able to efficiently amplify the biosynthetic domains from

Actinobacteria, Cyanobacteria, Proteobacteria and Planctomycetes bacterial strains, at

least, were obtained, useful for large-scale screening of bacterial isolates and bioprospection in metagenomic studies. If further validation confirms the efficiency, our primers may become a new standard for PCR-based metagenomics bioprospection. Antarctic samples revealed to harbour a large diversity of prolific phyla, mainly

Actinobacteria and Cyanobacteria. Bacterial strains from Actinobacteria, Firmicutes, and Proteobacteria phyla, and Fungi strains were isolated. Bioactivity was reported for the

first time for several strains, and a potential antimicrobial compound from a fungi

Penicillium is described. Furthermore, two potential novel species from two genera are

reported and according to the biosynthetic domain mining, are worth exploring.

Keywords

natural products, secondary metabolism, primers, microbial diversity, PKS, NRPS, bioactivity

(7)

Table of contents

Agradecimentos ... iii

Resumo ... iv

Abstract ... vi

List of Figures ... x

List of Tables ... xii

List of presentations ... xiv

List of abreviations ... xv

Polyunsaturated fatty acid ... xvi

I – Introduction ... 1

1 - A new era in Natural products - Gene and Genome mining for discovery of (novel) molecules ... 4

1.1 - Culture-dependent approach ... 4

1.2 - Culture independent approach – Metagenomics ... 7

Chapter 1 - Design of primer pairs targeting the biosynthetic domains of PKS and NRPS genes in Bacteria ... 10

I – Background ... 11

II – Goals ... 11

III – Materials and Methods ... 12

1 - Design of oligonucleotide primers ... 12

1.1 -Sequence retrieval for KS Domain of Type I PKS genes and AD domain for NRPS genes ... 12

1.2 - Multiple-sequence alignment and Phylogenetic Analysis ... 13

1.3 - In silico analysis of primers reliability ... 13

1.5- Optimization of the PCR Amplification protocol ... 14

1.5.1 – Genomic DNA extraction and quantification ... 14

1.5.2. – Optimization of PCR reactions: reagent and thermal conditions ... 15

1.5.3 – Comparison of amplification results for protocols using designed primers or literature primers ... 16

1.6 Sequencing of PCR products – Test and Phylogenetic analysis ... 16

1.6.1 – Phylogenetic and NaPDoS analysis ... 17

IV – Results ... 18

V - Discussion ... 26

Chapter 2 – Biodiversity and bioactive potential of the McMurdo Dry Valleys, Antarctica ... 27

I – Background... 28

II – Goals ... 30

(8)

1–Biodiversity of a Soil Transect and of a Rock with endolithic colonization in Victoria

Valley, Victoria Land, Antarctica ... 31

1.1 - Sample collection ... 31

1.2 - eDNA Extraction and 16S rRNA gene sequencing ... 32

1.2.1- QIIME Analysis of the 16S rRNA gene ... 32

1.3 - Prediction of the microbiome metabolic capacity using PICRUSt ... 33

2- Isolation of Microorganisms from a Soil Transect and endolithic sample from Victoria Valley ... 33

2.1 – Culture strategies –Soil samples T5 and T6 ... 33

2.2- Identification of Bacterial and Fungi Isolates through 16S rRNA and ITS gene amplification and Phylogenetic analysis ... 35

2.2.1 - Identification of bacteria and Fungl isolates using FTA Indicating Micro cards (WhatmanT_{) ... 35}

2.2.2 – Identification of bacterial isolates through 16S rRNA gene amplification .. 35

2.2.2.1 – DNA extraction ... 35

2.2.2.2 – PCR Amplification of the 16S rRNA gene ... 36

2.3– Phylogenetic analysis ... 36

3- Screening by PCR of PKS and NRPS genes in bacterial isolates ... 37

4 - Preparation of organic extracts for Bioactivity-Guided Isolation of Bioactive Molecules ... 38

4.1 – Organic extraction for Bioactivity Screenings ... 38

4.2 - Organic extraction (methanol and fractionation (VLC) from the Penicillium citrinum strain ... 39

4.2.1 – Organic extraction ... 39

4.2.2 – Fractionation of the organic extract ... 39

4.2.2.1 - Flash-chromatography of fraction 31 B ... 40

5 - Bioassays ... 41

5.1 - Antimicrobial screening susceptibility assay ... 41

5.2 - MTT Assay ... 42

IV-Results ... 43

1 - Biodiversity of a soil transect and of a rock with endolithic colonization in Victoria Valley, Victoria Land, Antarctica ... 43

1.1 - Alpha-diversity ... 43

1.2 – Beta-diversity ... 44

1.3. – Taxonomic composition ... 45

1.4 – Predicted Functional profile from 16S rRNA gene ... 47

2- Biodiversity of culturable strains from the McMurdo Dry Valleys, Antarctica ... 49

2.1 – Isolation, identification and phylogenetic analysis of obtained Isolates ... 49

(9)

2.1.2– Actinobacterial strains ... 51

2.1.3 – Proteobacteria Isolates ... 52

2.1.4– Fungi Isolates ... 54

3– Bioactive potential of Isolated Microorganisms ... 56

3.1– PCR Screening of bacterial isolates: PKS and NRPS genes ... 56

3.2 – Bioassay Screening ... 58

3.2.1 – Antimicrobial Assay ... 58

3.2.2 – Cytotoxic Assay ... 61

V-Discussion ... 68

1 - Biodiversity and Functional Profile of Endolithic and Soil Microbiomes from the McMurdo Dry Valleys ... 68

2 - Culture-dependent Isolation of Actinobacteria from McMurdo Dry Valleys... 70

2 - Bioactive potential from McMurdo Dry Valleys Microbial Isolates ... 72

VI –General Conclusion ... 75

VII - References ... 76

(10)

List of Figures

Figure 1 – Phylogenetic tree of the KS domain sequences collected for primer design

... 19

Figure 2 - Phylogenetic tree of the AD domain sequences collected for primer design ... 20

Figure 3 - PCR amplification of KS and AD domain from cyanobacterial gDNA. ... 21

Figure 4 - PCR amplification of KS and AD domain from gDNA of Planctomycetes and Streptomyces strains. ... 22

Figure 5 – Eletrophoresis gel of PCR products of KS and AD domains amplification using the optimized conditions. ... 23

Figure 6 – Phylogenetic tree encompassing the KS domain sequences used for primer design ... 24

Figure 7 –Structures of molecules produced by Antarctic Microorganisms, presented on Table 5. ... 29

Figure 8 – Location of sampling points in Victoria Valley (marked in red). ... 31

Figure 9 –(A) Fractionation apparatus (B) – Filtration of fraction though cotton. ... 40

Figure 10 – Rarefaction curves for alpha-diversity metrics. (A)- chao1; (B)-Phylogenetic diversity; ... 44

Figure 11 –PcoA plots using the unweighted (A) and weighted (B) UniFrac metrics. .. 45

Figure 12 - (A) Bar chart of frequency of phyla-affiliated OTUs per sampling point taxonomy summary. . ... 46

Figure 13 – Microhotographs of Paeniporosrcina sp. isolates. . ... 49

Figure 14 - Phylogenetic tree of the 16S rRNA nucleotide sequences of the obtained isolates (2F, 2H, 13F, 13G, 16D, 16 E, 17, 34, 36, 39, 47) from Firmicutes phylum and the closest matches at NCBI 16S database. ... 50

Figure 15 -Photographs of Actinobacterial strains isolated from soil sample T5. ... 51

Figure 16 - Phylogenetic tree of the 16S rRNA nucleotide sequences of the obtained isolates from Actinobacteria phylum and the closest matches at NCBI 16S database. 53 Figure 17 - Photographs of Fungi strains isolated from soil sample T5 and T6.. ... 54

Figure 18 - Phylogenetic tree of the ITS and D1/D2 rDNA nucleotide sequences of the obtained Fungi isolates and the closest matches at NCBI nucleotide collection. ... 55

Figure 19 - PCR amplification of KS using primer pair degK2F/deK2R and DKF/DK.. 56 Figure 20 – PCR amplification of AD domain using primer pair A3F/A7R . ... 57

Figure 21 – PCR amplification of KS and AD domain using primer pairs degK2F/degK2R and A3F/A7R, respectively. ... 57

Figure 22 - Photographic record of inhibition halos. ... 59

Figure 23 - Photographic record of inhibition halos. ... 60

Figure 24 - Photographic record of inhibition halos of the subfractions tested. ... 61

Figure 25 - Percentage of cell viability in the tumor cell line SH-SY5Y (neurobastoma), after 24h and 48h of exposure to organic extracts of Actinobacteria isolates. ……….61

Figure 26 - Percentage of cell viability in the tumor cell line T47-D (breast ductal carcinoma) ... 62

Figure 27 - Percentage of cell viability in the tumor cell line SH-SY5Y (neurobastoma), after 24h and 48h of exposure to VLC fractions of Penicullium citrinum strain 31. ... 63

(11)

Figure 28 -Percentage of cell viability in the tumor cell line T47-D (breast ductal carcinoma), after 24 and 48h of exposure to VLC fractions of Penicullium citrinum strain 31. ... 63 Figure 29- Percentage of cell viability in the tumor cell line SH-SY5Y (neurobastoma), after 24h and 48h of exposure to VLC sub-fractions (fraction B) of Penicullium citrinum strain 31. ... 64 Figure 30 - Percentage of cell viability in the tumor cell line T47-D (breast ductal

carcinoma), after 24h and 48h of exposure to VLC sub-fractions (fraction B) of

Penicullium citrinum strain 31 ... 64 Figure S 1 - Eletrophoresis gel of PCR products of KS domain amplification using the optimized conditions. (A) – using primer pair KSF1/KS_v2Rv and (B) using primer pair degK2F/degK2R.

(12)

List of Tables

Table 1 - Principal domains present in PKS and NRPS enzymes and their associated functions.. ... 3 Table 2- Some of the principal available bioinformatic tools and databases for supporting natural products discovery, with relevance for this study.. ... 6 Table 4 – List of primer pairs published for amplification of AD and KS domains of PKS and NRPS genes, respectively. ... 9 Table 5 - Distribution of KS and AD domain sequences selected from the 10 bacterial phyla and group in study. ... 13 Table 6 - Example of new bioactive molecules retrieved from Antarctic Microorganisms. The respective structures are depicted below on figure Figure 7... 29 Table 7 – List of primers used in this work. ... 38 Table 8 - Solvent mixtures (eluents) utilized in the fractionation of the crude extract from Penicillium citrinum strain 31 ... 40 Table 9 - Solvent mixtures used for elution on Flash-Chromatography of fraction 31B. ... 41 Table 10 – Picrust KEGG pathways. ... 48 Table 11 – Antimicrobial activity of organic extracts tested. ... 58 Table 12 - Antimicrobial activity of VLC fractions from the crud extract of Penicillium citrium strain 31. ... 59 Table 13 - Antimicrobial activity of flash-chromatography sub-fractions from 31B-1 to 31B-9. ... 60 Table 14 - Summary table of obtained isolates and results from PKS/NRPS genes and bioassays screening.. ... 65

Table S 1- Information of nucleotide sequences of KS domain collected for primer design. ... 85 Table S 2 - Information of nucleotide sequences of AD domain collected for primer design.. ... 88 Table S 3 – Primer pairs designed for KS and AD domain.. ... 94 Table S 4 – Information of bacterial strains used for primer testing, including

antiSMASH genome analysis. ... 95 Table S 5 - Detailed information of alpha-diversity measure obtained for each sample in study, including number of sequence, average of Phylogenetic diversity, chao1, observed OTUs metrics. ... 96

(13)

Table S 6 - Summary table of taxonomic frequency distributions at Genus level for

Cyanobacteria. ... 98 Table S 7 - Summary table of taxonomic frequency distributions at Phylum level. ... 99

(14)

List of presentations

Rego A, Costa MS, Ramos V, Hong SG, Vasconcelos V, Magalhães C, Leão P (2017). Extreme Polar Microorganisms: Biodiversity and Chemodiversity. IJUP 2017, Porto, Portugal, February 8-10.

Oral communication

Rego A, Costa MS, Ramos V, Vasconcelos V, Baptista M, Carvalho F, Magalhães C, Leão P (2017). Biodiversity and Bioactive Potential of Antarctic Microbiomes. Bioinformatics Open Days, Braga, Portugal, February 22-24.

Poster presentation

Rego A, Costa MS, Ramos V, Hong SG, Vasconcelos V, Magalhães C, Leão P (2017). Exploring Antarctic Microbiomes as Source of Bioactive Molecules. XIIth SCAR Biology Symposium, Leuven, Belgium, 10-14 July 2017.

Oral communication

Rego A, Costa MS, Ramos V, Vasconcelos V, Magalhães C, Leão P (2016). Biodiversity and Chemodiversity of Extreme Polar Bacteria. IJUP 16, Porto, Portugal, February 17-19.

Oral communication

Rego A, Costa MS, Ramos V, Hong SG, Vasconcelos V, Magalhães C, Leão P (2016). Extreme Polar Microorganisms: A Biotechnological Approach. VIII Conferência Portuguesa de Ciências Polares, Lisboa, Portugal, October 26-28.

(15)

List of abreviations

16S rRNA 16S ribosomal RNA

A(D) Adenylation domain

ACP Acyl carrier protein

AIA Actinomycete Isolation Agar BGC Biosynthetic gene clusters

BLAST Basic Local Alignment Search Tool

Bp Base pair(s)

CTAB bromide-polyvinylprrolidone-b-mercaptoethanol DMEM Dubelco's Modified Eagle Medium

DMSO Dimethyl sulfoxide

DNTP Deoxyribonucleotides triphosphate

e.g. Exempli gratia

END Endolithic sample

ER Enoyl reductase

ICTAR International Centre of Terrestrial Antarctic Research: ITS Internal transcribed spacer

KEGG Kyoto Encyclopedia of Genes and Genomes

KO KEGG Orthologs

KS Ketosynthase

LB Luria Broth

MB Marine Broth

MH Mueller-Hinton

MNPS Modified nutrient-poor sediment

MTT 3-(4,5-dimethylthiazol-2-yl)-2,5-diphenyltetrazolium bromide NCBI National Center for Biotechnology Information

NGS Next-Generation Sequencing

NJ Neighbor-joining

NP(s) Natural product(s)

(16)

-

NRP Nonribossomal peptide

NRPS Nonribosomal peptide synthetase

OUT Operational Taxonomic Units

PcoA Principal Coordinate Analysis PCP Peptidyl Carrier Protein

PCR Polymerase chain reaction

PD Phylogenetic diversity

PK Polyketide

PKS Polyketide synthase

PRISM Products Prediction Informatics for Secondary Metabolomes PUFA Polyunsaturated fatty acid

PVC Planctomycetes, Verrucomicrobia and Chlamydiae QIIME Quantitative insights into microbial ecology

RB Round-bottom

RKO Cell lines of colon carcinoma SH-SY5Y

TAE

Cell lines of colon carcinoma Tris-acetate-EDTA

TLC Thin layer chromatography

UV Ultraviolet

(17)

I – Introduction

Since the discovery of penicillin by Alexander Fleming in 19291_{, microorganisms are}

recognized as a rich source of bioactive compounds, and have yielded a plethora of medically important compounds2_{. Fungi from the Ascomycota phylum and Bacteria,}

specifically the phyla Actinobacteria, Cyanobacteria, Proteobacteria and Firmicutes are considered prolific producers of natural products3_{. With the current prevalence of}

diseases such as cancer and with the increase in incidence of multidrug-resistant bacterial infections4_{, it is extremely urgent to find and develop new potential drugs.}

Microbial natural products (NPs) are considered among the most promising and reliable sources of novel drug leads4_.

Despite their structural and chemical diversity, a large fraction of bioactive microbial natural products belongs to the polyketides (PKs) and nonribossomal peptides (NRPs) biogenetic families, or are their hybrids. Clinically important drugs, including antibiotics (e.g the PKs erythromycin and tetracyclins, and the NRP penicillin G5_{), anticancer}

chemotherapeutics (e.g the hybrid NRP-PK didemnin anticancer agent6₎

andimmunosuppressants7_{(cyclosporine, NRP}8_{), fit in these natural products families}3_.

The polyketide synthase (PKS) and nonribossomal peptide synthetase (NRPS) are the families of enzyme complexes responsible for PK and NRP biosynthesis9_{. These are}

integral parts of biosynthetic gene clusters (BGCs), i.e. a set of two or more genes physically grouped on the genome that encode the biosynthetic pathway for the production of a natural product10_{. Furthermore, these enzymes are organized in modules}

and act by sequential thiotemplated assembly of acyl-CoA (PKS) and amino acid (NRPS) building blocks, catalysing C-C and C-N bond linkages, respectively11,12_.

A PKS module is essentially composed of a ketosynthase (KS), acyltransferase (AT) and thiolation (T) domain (also referred to as Acyl Carrie Protein – ACP). Additionally, might possess a ketoreductase (KR), dehydratase (DH) and enoyl reductase (ER) domain13

(see Table 1 for description of each domain function). To date, PKSs are classified into three different classes according to the organization of their catalytic domains (non-modular, monomodular and multimodular) and in different subclasses by their mode of action (e.g. iterative, non-iterative, cis-AT or trans-AT)13,14_{. Type I PKSs employ a}

multimodular strategy with each module constituted of specific catalytic domains for the recognition, activation and condensation of acyl-CoA, while Type II and Type III PKSs possess each catalytic site separated in different proteins3,15_{. Type I PKSs are}

responsible for the biosynthesis of macrolides, polyethers and polyenes whereas type II PKSs are usually associated with the formation of cyclic aromatic and often polycyclic

(18)

PK compounds16_{. Type III PKS, also referred to as chalcone-synthase like PKSs, found}

predominantly in plants, are condensing enzymes without a T domain and typically act directly on acyl-CoA substrates17_.

NRPSs resemble type I PKSs in modular organization, with incorporation and processing per module of each amino acid16_{. Analogously to PKS, a NRPS module is minimally}

composed of a condensation (C), adenylation (A) and T domain (also referred to as Peptidyl Carrier Protein – PCP)12_{, optionally possessing cyclization (Cy), epimerization}

(E), methyltransferase (MT) and ketoreductase (KR) domains (Table 1) between others9_.

The enzymatic domains KS, A and C, particularly, possess highly conserved core motifs18_{. Type I PKSs are frequently associated with NRPS, by co-occurring as part of}

the same BGC, resulting in hybrid molecules with increased structural diversity when compared to non-hybrid PKs or NRPs.

Traditionally, the prospection for novel bioactive compounds is dependent on the cultivation of the microorganisms, followed by a bioactivity screening of their organic (or sometimes aqueous) extracts. A bioactivity-guided isolation through consecutive fractionations and bioassay testing is then typically employed until isolation of the pure compound is achieved. This approach has exposed over 11,000 PK and NRP products19_.

However, with the advance in DNA sequencing through the Next-Generation Sequencing (NGS) technologies20_{, an explosion of genome sequences and}

genome-mining studies has revealed microorganisms as an underexplored source of PK and NRP compounds, with only a small fraction of PKS-NRPS gene clusters (10%) being associated with known products19_{. This realization triggered a renewed interest and}

(19)

Table 1 - Principal domains present in PKS and NRPS enzymes and their associated functions. Adapted from Bachmann and Ravel 200922_{and Adamek et al. 2017}23_.

Domain Essential Function

AT – Acyltransferase Selection and activation of acyl-CoA substrate through acylation

KS – Ketosynthase Catalyses C-C bond formation through Claisen condensation

KR – Ketroreductase Reduction of keto groups to hydroxyl groups

DH -Dehydratase dehydration of hydroxyl group to α-β-enoyl ER – Enoyl reductase reduction of the enoyl double bond

T – Thiolation phosphopantetheinylate acyl carrier protein shared by PKS and NRPS (also commonly referred to as ACP domain for PKSs and PCP domain for NRPSs)

TE -Thioesterase Cleavage of mature PK/NRP via macrocylization or release of the full length chain.

MT – Methyltransferase Methylation of PK, and N-methylation of NRP

C – Condensation Formation of Peptide bond

A – Adenylation Amino acid activation via intermediary adenylation

E – Epimerization Epimerizing amino acids, flipping stereo-chemistry

Re – Reductase Reduction (usually terminal) of mature PK/NRP resulting in aldehyde

Cy – Cyclization Formation of a peptide bond and subsequent amino acid cyclization

(20)

1 - A new era in Natural products - Gene and Genome mining

for discovery of (novel) molecules

1.1 - Culture-dependent approach

In spite of the usefulness of NPs, an increasing disinterest by the pharmaceutical companies for NP drug discovery programmes was verified in the last decades of the previous century in part due to the repeated isolation of known compounds 24_{, to the}

laborious, time consuming and expensive procedures necessary to isolate them, as well as the difficulty to obtain synthetic analogues 25_{. On the other side, the emergence of}

combinatorial chemistry promised (at the time) plenty new compounds to be used for bioactivity screening tests 26_.

The advance of new sequencing technologies in the beginning of 21st_century20_dictated

a new golden era of natural products discovery 25_{. The exponential increase in number}

of genome sequences available at databases made possible the development of different bioinformatic tools (Table 2), providing a more targeted discovery, overcoming most of the disadvantages refereed above. This development was the basis for an explosion of genome-mining studies for supporting natural products discovery 27,28_.

The so-called “genome mining” approach is useful to identify gene clusters potentially responsible for the synthesis of novel compounds. By perceiving the composition and regulation of the gene clusters, a targeted isolation and characterization of new PKs and NRPs molecules can be followed, reducing prospection time and costs. Besides, the information can be used to help to activate “silent” gene clusters, as well as to optimize the conditions for heterologous expression23_{, as exemplified by the production of the}

antibiotic pantocin B in E.coli29_{. This approach revealed unexpected enzymatic diversity}

and extended the knowledge concerning the distribution of these enzymes through the three domains of life9_{. Of particular relevance, a positive correlation between genome}

size and the fraction of genome allocated to secondary metabolite biosynthesis has been verified by Konstantinidis and Tiedje 200430_{. It has also been described that genomes}

bigger than 3 Mb are likely to have at least one PK and NRP gene cluster31_while

genomes with less than 2000 ORFS, are very likely to not possess secondary metabolism-related genes30_.

With the progress on sequencing technologies and the increasing number of DNA sequences deposited in public databases, not only the genome mining but also homology-based PCR screening32_{started to be used, to screen the biosynthetic potential}

of the strains before large-scale cultivation and/or sequencing of the entire genome. KS, A and C enzymatic domains possess highly conserved core motifs18_{, consensus}

(21)

the primer sets developed were designed to be specific for some bacterial phyla, usually for the most prolific ones as Actinobacteria33_{, Cyanobacteria}34,35_{and also for specific}

genera as Streptomyces36 (Table 3), restricting its usefulness.

Together with the development of bioinformatic tools, directed to the identification of biosynthetic gene clusters and discovery of secondary metabolites, the amount of available information (genomes and DNA sequences) led to the creation of a series of natural product biosynthesis-related databases37_{. Some examples of the created}

platforms are antiSMASH38_{, a tool directed to the identification of BGCs and catalytic}

domains through the analysis of entire genomes/BGCs, NaPDoS11_{, a web-tool directed}

to the identification of catalytic domains of PKS and NRPS and NRPSpredictor239_{a web}

(22)

Table 2- Some of the principal available bioinformatic tools and databases for supporting natural products discovery, with relevance for this study. Adapted from Medema and Fischbach 20157_and

Adamek et al. 201723_.

Tool or database Web server URL Brief description Reference

AntiSMASH https://antismash.secondarymetabolites.org/ Is a web based-tool for the automatic genomic identification of BGC. 38 NaPDoS http://napdos.ucsd.edu/ Is a web-based tool for a fast identification and analysis of secondary

metabolite genes.

11

ESNaPD http://esnapd2.rockefeller.edu/ Is a web server that provides an automated analysis tool for surveying secondary metabolite gene cluster diversity in metagenomics studies.

40

SMURF http://jcvi.org/smurf/index.php Is a web-based tool that finds secondary metabolite biosynthesis genes and pathways in fungal genomes.

41

PRISM http://magarveylab.ca/ prism/ Is a computational tool for the identification of BGC and prediction of genetically encoded NRP and type I and II PK .

24

NRPS/PKS substrate predictor

http://nrps.igs.umaryland.edu/ Is a knowledge-based tool for elucidating domain organization and substrate specificity of NRPSs and PKSs.

22

NRPSpredictor2 http:// nrps.informatik.uni-tuebingen.de/ Is a predictor of A domain specificity. 39

(23)

1.2 - Culture independent approach – Metagenomics

The ability to efficiently annotate and predict NP biosynthetic genes, as described above, opened the door to culture-independent natural products discovery. In fact, one of the main barriers and challenges faced by traditional natural products research is the ability to isolate and grow the microorganisms in laboratory. It is presumed that the majority of prokaryotes are present in oceanic and terrestrial subsurface environments42_{, typically}

remaining inaccessible and unstudied. One gram of soil is expected to contain 107 _–

1010_{prokaryotic cells}42_{, equivalent to about 10}6 _{different genomes}43_{. The uncultured}

microorganisms present in soil and other environmental samples represent a rich reservoir of novel natural products43_{. The strategies currently employed to maximize}

recovery into culture of the biodiversity present in a given sample include utilization of a variety of media constituents, change of growth conditions, mimicking environmental conditions, using minimal media for oligotrophic sites, consecutive dilution of the original inoculum, community culture and co-culture, among others2,43_{. Nevertheless, the}

well-known “great plate count anomaly”44_{, which refers to a cultivable fraction of the microbial}

richness below 1%45_{is still observed today. Hence, a huge fraction of the biodiversity}

(and associated chemodiversity) is lost during the culture process in laboratory. Furthermore, because microbial secondary metabolites (i.e., natural products) are sometimes produced in response to some kind of stress or environmental stimulus (e.g. environmental stress, pathogen attack), laboratory cultures of microorganisms under standard growth conditions often do not provide access to the full natural products potential of a given isolate.

Against this backdrop, metagenomics has presented as a path to reach the uncultured biodiversity46_{and the correspondent biosynthetic diversity. The extraction of DNA from}

environmental samples (eDNA), i.e. the metagenome, allows by one side, the identification of the bacterial species present (cultured and uncultured ones) through the PCR amplification and sequencing of the 16S rRNA gene2_{and by other side can provide}

information concerning the diversity of biosynthetic genes, through the PCR amplification and sequencing of the catalytic domains, usually KS34_{, A}47_{and C}48_{domains. This}

PCR-based sequence approach can be used to identify clones of known biosynthetic domains presents in metagenomics libraries as well as to find totally novel molecules produced by known BGCs7_{. Notably, metagenomics associated with heterologous expression of}

eDNA has enabled the discovery of different natural products from uncultured microorganisms49_{. Together with available bioinformatic tools, such as NaPDoS (see}

Table 2) a web tool useful for the assessment of BGCs diversity though the analysis of phylogenetic relationships of sequence tags from the PKS and NRPS genes11_the

(24)

eSNAPD (see Table 2), a web-based bioinformatic platform useful for the discovery of BGCs coding for novel NPs using metagenomics data40_{, the identification of potential}

new BGCs and consequent novel molecules can be employed.

The afore-mentioned PCR-based approach has resulted in a variety of studies, including some biogeographic studies 50–52 with identification of hotspots of bioactive potential, and more recently for prospecting of Antarctic soils53_{. In fact, the study of less exploited}

environments – as are extreme polar environments – has resulted in the discovery of novel species and molecules (reviewed by Wilson and Brimble 200954 _{in “Molecules}

derived from the extremes of life”). Not only is a large part of the microbial diversity in these environments still unknown, but also unique adaptations have been developed by the microbiota in these habitats, in order to survive the extreme environmental stresses, including the biosynthesis of exclusive chemical entities with unprecedented biological activities54,55_.

A limiting issue on this PCR-based approach concerns the available primer sets for amplification of the catalytic domains. The majority of PCR primers for amplification of the biosynthetic domains – mainly for KS and A but also C domains – were initially designed to be specific for certain bacterial phyla, typically the most prolific, as it is the case of Actinobacteria 33_{and Cyanobacteria}35_{. Often these primers were restricted to a}

catalytic domain class [e.g. exclusive for Type I, subclass modular of PKS gene 56_{)], in}

some part, due to the higher number of sequences available for these bacterial groups. However, currently, with the extremely large amount of genome sequences present in public databases, we have access to sequences from a broad variety of bacterial phyla

57_{, including the more abundant and chemically-prolific. This creates an opportunity for}

the design of universal primers for PKSs and NRPSs, that would allow to obtain a more accurate representation of the real biosynthetic diversity present in environmental samples.

In accordance to this, we aim to combine state-of-the-art culture-dependent and independent approaches to achieve the overall goal of exploring the diversity of bioactive small molecules from polar microbiomes. Specifically, the following are objectives of this dissertation and correspond to Chapters 1, 2 and 3, respectively:

(1) To design a primer pair capable of amplify the KS and A domain of PKS and NRPS genes from a wide range of chemically-prolific bacterial phyla.

(2) To analyse the biodiversity and the biosynthetic richness of polar microbiomes, through amplification and sequencing of the 16S rRNA gene and PICRUSt predictions.

(3) To isolate and grow microorganisms from Antarctic environmental samples and analyse their bioactive potential through in vitro assays.

(25)

Table 3 – List of primer pairs published for amplification of AD and KS domains of PKS and NRPS genes, respectively.

Domain and respective gene Primer name Sequence (5’-3’) Reference Notes

AD – NRPS (700-800bp) A3F (GCSTACSYSATSTACACSTCSGG) 33 _{Specific for Actinobacteria}

AD - NRPS (700-800bp) A7R (SASGTCVCCSGTSCGGTA) 33 _{Specific for Actinobacteria}

AD – NRPS (480bp) NRPS_F (CGCGCGCATGTACTGGACNGGNGAYYT) 53 _{Designed for Metagenomic studies}

AD - NRPS (480bp) NRPS_R (GGTCCGCGGGACGTARTCNARRTC) 53 _{Designed for Metagenomic studies}

AD – NRPS (300bp) A2gamF (AAGGCNGGCGSBGCSTAYSTGCC) 58 _{Conserved motif A2}

AD - NRPS (300bp) A3gamR (TTGGGBIKBCCGGTSGINCCSGAGGTG) 58 _{Conserved motif-A3}

AD – NRPS (1000 bp) MTF2 [GCNGG(C/T)GG(C/T)GCNTA(C/T)GTNCC] 35 _{Specific for Cyanobacteria}

AD – NRPS (1000 bp) MTR [GCNGG(C/T)GG(C/T)GCNTA(C/T)GTNCC] 35 _{Specific for Cyanobacteria}

C – NRPS (700 bp) CnDmF [ATGCATCACATT(AG)TN(TC)(TC)NGA] 48 _{For metagenomics studies}

C – NRPS (700 bp) DCCR [GTGTTNAC(AG)AA(AG)AANCC(AGT)AT] 48 _{For metagenomics studies}

KS - Type I PKS (700 bp) degK2F [GCIATGGAYCCICARCARMGIVT] 59 _{Specific for Type I Modular PKS}

KS - Type I PKS (700 bp) degK2R [GTICCIGTICCRTGISCYTCIAC] 59 _{Specific for Type I Modular PKS}

KS - Type I PKS (1200-1400bp) K1F [TSAAGTCSAACATCGG BCA] 33 _{Specific for Actinobacteria}

KS - Type I PKS (1200-1400bp) M6R [CGCAGGTTSCSGTACCAGTA] 33 _{Specific for Actinobacteria}

KS – Type I PKS (700 bp) DKF [GTGCCGGTNCC(AG)TG(GATC)G(TC)(TC)TC] 34 _{Specific for Type I PKS}

KS – Type I PKS (700 bp) DKR [GCGATGGA(TC)CCNCA(AG)CA(AG)(CA)G] 34 _{Specific for Type I PKS}

KS – Type I PKS (700 bp) KSF (CGC TCC ATG GAY CCS CAR CA) 60 _{Specific for Type I PKS}

KS – Type I PKS (700 bp) KSR (GTC CCG GTG CCR TGS SHY TCSA) 60 _{Specific for Type I PKS}

KSα - Type II PKS (600bp) KSα – F (TSG CST GCT TCG AYG CSA TC) 36 _{Specific for Streptomyces and Type II PKS}

KSα - Type II PKS (600bp) KSα – R (TGGAANCCGCCG AAB CCGCT) 36 _{Specific for Streptomyces and Type II PKS}

KSα - Type II PKS (554 bp) 540F (GGITGCACSTCIGGIMTSGAC) 61 _{Specific for Actinobacteria}

KSα - Type II PKS (554 bp) 1100R (5’CCGATSGCICCSAGIGAGTG3’) 61 _{Specific for Actinobacteria}

KSβ – Type II PKS (1500 bp) dp:KSα (5’TTCGGCGGXTTCCAGTCXGCCATG3’) 62 _{Specific for Iterative Type II KSβ PKS}

KSβ – Type II PKS (1500 bp) dp:ACP (5’TCCAGCAGCGCCAXCGACTCGTAXCC3’) 62 _{Specific for Iterative Type II KSβ PKS}

KSβ – Type II (350bp) PKS_F (5’GGCAACGCCTACCACATGCANGGNYT3’) 53 _{Designed for Metagenomic studies}

(26)

Chapter 1 - Design of primer

pairs targeting the biosynthetic

domains of PKS and NRPS

genes in Bacteria

(27)

I – Background

PKSs and NRPSs are mega enzymes responsible for the biosynthesis a large fraction of NPs of pharmacological importance 63_{. With the recent advances in genome}

sequencing, it is now recognized that biosynthetic potential is not restricted to the most prolific and well-studied phyla in this regard, as are Actinobacteria and Cyanobacteria 3_,

but widespread throughout the tree of life 9_{. Recent bioinformatic studies have suggested}

that under-explored bacterial groups, previously considered poor in NPs, actually possess the genetic potential to produce secondary metabolites of the NRPS and PKS types 3_{. Examples are members of the bacterial phyla Verrucomicrobia, Chlamydiae, and}

Elusimicrobia.

PCR-based strategies using primers targeting biosynthetic genes, such as the KS 34_of

PKSs, as well as the AD 47_{or C}48_{domain of NRPSs have been used to assess the}

bioactive potential of bacterial isolates. More recently, this approach has also been employed to survey the biosynthetic potential in microbiomes/metagenomes, directly from the eDNA 50–52. However, this strategy is currently limited by the use of primers originally designed to be specific to some bacterial phyla, mainly Actinobacteria 33,36_{. As}

such, using current molecular tools for PCR-based screening (which is predicted to become even more frequent due to NGS sequencing technologies), some biosynthetic potential remains unreachable, in particular from those phyla that have traditionally been neglected in terms of secondary metabolite biosynthesis.

Against this backdrop, we envision that better-performing, universal primer pairs for PKSs and NRPSs can provide useful for large surveys of biosynthetic potential from eDNA, which we expect to become ever more common.

II – Goals

Here, we aimed to design “universal” primer pairs, amenable to NGS-sequencing studies, able to amplify the biosynthetic domains of genes responsible for the production of pharmaceutically-relevant NPs (in this case PKs and NRPs) from a wide range of bacterial phyla (Cyanobacteria, Firmicutes, Chloroflexi, Actinobacteria, Bacteroidetes,

Planctomycetes, Verrucomicrobia and Chlamydiae group (PVC), Deltaproteobacteria, Alphaproteobacteria, Betaproteobacteria and Gammaproteobacteria).

(28)

III – Materials and Methods

1 - Design of oligonucleotide primers

1.1 -Sequence retrieval for KS Domain of Type I PKS genes and AD domain for NRPS genes

KS Domain sequences were retrieved for the seven described groups of Type I PKS (Enediyne, PUFA, Trans-AT Hybrid, Iterative, Modular and KS1 11_{). Ten bacterial phyla}

and one bacterial group (PVC), i.e. the most abundant according to the most recent tree of life 57_{were selected with the intent of covering most of the potential biodiversity. The}

selected Phyla are described in Table 4.

Aminoacidic KS Domain sequences from already characterized molecules for each PKS class were retrieved from the NaPDoS database 11_{. The aa sequences were submitted}

to the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool (BLAST) for a tblast(n) search, and the correspondent nucleotide sequences were retrieved. The nucleotide sequences served then as an initial query for a blast(n) search at the nucleotide collection database at NCBI for each of the bacterial phyla. The genomes/biosynthetic gene clusters with homologues were submitted to an analysis antiSMASH 3.0 38_{. The Protein sequences of KS domains obtained through}

antiSMASH analysis were recovered and through a tblast(n) search, the correspondent nucleotide sequence was obtained. The whole nucleotide sequences were submitted to an analysis with NaPDoS, to get an insight into its class, and grouped in accordance to the result obtained. The resultant nucleotide sequences served also as query for the remaining ketosynthases classes.

For A domain of NRPS genes, the NORINE 37,64_{and NaPDoS databases were used to}

(29)

Table 4 - Distribution of KS and AD domain sequences selected from the 10 bacterial phyla and group in study.

Bacterial Phyla Number of sequences retrieved

KS domain NRPS domain Actinobacteria ₈ ₁₁ Cyanobacteria ₈ ₁₁ Firmicutes ₁₀ ₁₄ Chloroflexi ₅ ₄ Bacteroidetes ₅ ₁₀ Elusimicrobia ₁ ₂ PVC Group ₈ ₃ Deltaproteobacteria ₁₂ ₁₀ Alphaproetobacteria ₇ ₉ Betaproteobacteria ₆ ₇ Gammaprotobacteria ₆ ₉

1.2 - Multiple-sequence alignment and Phylogenetic Analysis

A total of 76 and 90 sequences of KS (Supplementary Table S 1) and A (Supplementary Table S 2) domains collected in NCBI, repectively, were used. The nucleotide sequences were aligned through ClustalW (default parameters) in MEGA7 65_{. The alignments were}

manually reviewed to remove short sequences and the extremities trimmed. For primer design, conserved sites were fixed at 80% in MEGA 7 and the alignment submitted to Geneious 8.1.966_{, with a consensus threshold of 75% to favour the determination of the}

most conserved sites. The selected conserved sites were inspected and the degenerate primer pairs were manually designed. The aligned sequences were also subjected to a phylogenetic analysis with MEGA7. Phylogenies were reconstructed using Neighbor-joining (NJ), with 1000 bootstrap replicates.

1.3 - In silico analysis of primers reliability

The designed primer sequences (Supplementary Table S 3) were submitted to a variety of online servers to calculate their properties as well as to test by virtual PCRs their predicted performance. For oligonucleotide primer properties, the primer sequences were submitted to: OligoCalc version 3.27 67_{, OligoAnalyzer 3.1 Tool (Integrated DNA}

Technologies-http://eu.idtdna.com/calc/analyzer) and Multiple primer analyser (ThermoFisher Scientific- https://www.thermofisher.com/pt/en/home/brands/thermo-

(30)

PCR amplifications were performed with In silico simulation of molecular biology experiments - http://insilico.ehu.es/ 68,69_, _iPCR _(product _extractor) _-

http://www.ch.embnet.org/software/iPCR_form.html and Sequence Manipulation Suite: PCR Products - http://www.bioinformatics.org/sms2/pcr_products.html 70_.

1.4- Optimization of the PCR Amplification protocol

1.4.1 – Genomic DNA extraction and quantification

Genomic DNA from Cyanobacteria, Actinobacteria, Planctomycetes, Beta-proteobacteria, Gamma-proteobacteria and Alpha-proteobacteria strains were used to

test the efficiency of the designed primers. Only bacteria with already sequenced genomes were selected, except for Cyanobacteria, which were selected based on the presence of PKS and NRPS genes as detailed in a previous study 71_{. The genomes of}

the selected bacteria were submitted to antiSMAH analysis to survey the presence of PKS and NRPS genes. A detailed list of the bacterial strains used is depicted on (Supplementary Table S 4).

For DNA extraction, bacteria (except Cyanobacteria) were grown in 5 mL of liquid culture media in 50 mL falcons, at constant agitation (100 rpm) at 27 ºC. ML14 72_{, Marine broth}

(MB) and Luria Broth (LB) media were used for Planctomycetes, Proteobacteria and

Chromobacteria violaceum, respectively. Planctomycetes strains were gently ceded by

Olga Lage (CIIMAR/FCUP, Porto, Portugal). Genomic DNA from Streptomyces was a kind gift from Marta Vaz Mendes (i3S, Porto, Portugal). The gDNA was then extracted using the E.Z.N.A.® Bacterial DNA Kit (OMEGA bio-tek). The manufacturer’s instructions were followed and DNA eluted with elution buffer in a final volume of 100 µL. The integrity of the gDNA was visualized through an electrophoresis gel, a 0.8% agarose gel prepared in Tris-acetate-EDTA (TAE) buffer 1x. and stained with 1 µL of SYBR® Safe DNA Gel Stain (ThermoFisher Scientific). One microliter of DNA (with loading dye) was loaded onto each lane and the gel submitted to an electrophoresis at 80 V for 30 minutes. For cyanobacterial gDNA extraction, fresh biomass from Z8 73_{liquid medium (about 2}

mL) was harvested for each selected cyanobacterium. The gDNA was then extracted and purified using the Purelink Genomic DNA Mini Kit (Invitrogen), applying the Gram Negative Bacterial Cell Protocol.

The DNA concentration was measured in a Qubit® 3.0 Fluorometer (Life technologies) by using a Qubit® dsDNA HS Assay Kit (Life Technologies). The manufacturer’s instructions were followed and 1 µL of each gDNA was used. The gDNAs were

(31)

normalized to a final concentration of 25 ng µL-1_{, unless the initial concentration was}

lower than this value.

1.4.2. – Optimization of PCR reactions: reagent and thermal conditions

To determine the best PCR conditions, including improved specificity and strong amplification, different conditions and reagents were tested.

Three different Taq DNA Polymerases were employed: Gotaq (Promega), DreamTaq (Thermo Fisher Scientific) and TaKaRa hot start version (Clontech). As per the manufacturer’s instructions, the basic PCR protocol used with GoTaq Polymerase consisted of: 1× Green GoTaq® Flexi Buffer (Promega), 2.5 mM MgCl2 (Promega), 500 μM of DNTP Mix (Promega), 1 μM of each of the primers (STABVIDA), 0.5 U of GoTaq® DNA Polymerase (Promega) and of 2 μL template DNA in 20 μL of reaction. The standard PCR conditions executed were: initial denaturation step at 94 ºC for 10 min, followed by 40 cycles of a denaturation step at 94 ºC during 30 s, annealing (determined temperature) for 30 s and extension at 72 ºC for 1 min and a final extension step at 72 ºC for 7 min. The basic protocol for DreamTaq consisted of: 1x Dream Taq PCR Mastermix, 1 μM of each primer and 2 μL of template DNA in a final volume of 25 μL. The standard PCR conditions executed: initial denaturation step at 95 ºC for 2 min, followed by 30 cycles of denaturation step at 95 ºC for 30 s, annealing (ºC determined) for 30 s, extension for 1 min, and a final extension of 10 min at 72 ºC. The basic protocol for TaKara consisted in: 1× TaKaRA PCR Buffer (TAKARA BIO INC), 1.5 mM MgCl2 (TAKARA BIO INC), 250 μM DNTPS (TAKARA BIO INC), 1 μM of each of the primers (STABVIDA), and 0.5 U TaKaRa Taq™ Hot Start Version (TAKARA BIO INC) and 2 μL of template DNA in a final volume of 20 μL. The PCR conditions executed were: initial denaturation step at 98 ºC for 2 min, followed by 40 cycles of a denaturation step at 94 ºC for 30 s, annealing at the determined temperature for 30 s and extension at 72 ºC for 1 min, followed by a final extension step at 72 ºC for 5 min.

Variations to the reaction mixtures mentioned above were carried out and included: gradient of primer concentration, MgCl2 and DNTPs concentration conjugated with presence/absence of UltraPureTM_{BSA (Life teschnologies), and gradient of}

concentration (1-3%) of DMSO (Thermo Scientific). Concerning the thermal cycling protocol, initially, a gradient of annealing temperatures was performed for each primer pair to determine the best annealing temperature. Extension time and number of cycles were also object of optimization. A Touchdown PCR protocol 74_{was also employed for}

the Taq polymerase TaKaRA hot start version. The protocol consisted of: initial denaturation step at 95 ºC for 3 min, followed by 10 cycles of a denaturation step at 95 ºC for 30 s, annealing at 75 ºC for 45 s and extension at 72 ºC for 25 s, followed by 25

(32)

cycles of a denaturation step at 95 ºC for 30 s, annealing at 60 ºC for 45 s, extension at 72ºC for 25 s and a final extension step at 72ºC for 5 min.

1.4.3 – Comparison of amplification results for protocols using designed primers or literature primers

The primer pairs degK2F/degK2R 59_{and A3F/A7R}33_{were used for PCR amplification of}

KS and AD domain and were the benchmark against which the designed primers were compared. For the literature primer pairs, the thermal cycling was based on the literature protocols and were performed at Veriti® 96-Well Thermal Cycler (ThermoFisher Scientific). The PCR reactions were prepared in a volume of 20 μL containing 1× TaKaRA PCR Buffer (TAKARA BIO INC), 1.5 mM MgCl2 (TAKARA BIO INC), 250 μM DNTPS (TAKARA BIO INC), 0.625 μL of primer (100 μM), 0.25 mg/mL of UltraPureTM BSA (Life technologies), 0.5 U TaKaRa Taq™ Hot Start Version (TAKARA BIO INC) and 2 μL of template DNA. The PCR conditions executed were: initial denaturation step at 95 ºC for 4 min, followed by 40 cycles of a denaturation step at 94 ºC for 30 s, annealing at 67,5 ºC for 30 s and extension at 72 ºC for 60 s, followed by a final extension step at 72 ºC for 5 min, for amplification of AD domain using primer pair A3F/A7R. For primer pair degK2F.i/degK2R.i the conditions executed were: initial denaturation step at 95 ºC for 4 min, followed by 40 cycles of a denaturation step at 94 ºC for 40 s, annealing at 56,3 ºC for 40 s and extension at 72 ºC for 75 s, followed by a final extension step at 72 ºC for 5 min. PCR products (10 μL loaded onto each well) were separated by electrophoresis on a 1.5% (w/v) agarose gel during 40 minutes at 120 V, together with 5 μL of GRS ladder 1 kb (Grisp). Gel was stained with 1 μl SYBR® Safe DNA Gel Stain (ThermoFisher Scientific), visualized under UV-light at Gel Doc XR+ System (BIO-RAD) and analysed with Image Lab™ software (BIO-RAD).

1.5 Sequencing of PCR products – Test and Phylogenetic analysis

After the PCR protocol optimization, an initial test with the following gDNAs was carried out to evaluate primer functionality. Nodosilinea nodulosa LEGE 06152, Cobetia marina CECT 4278, Halomonas aquamarina CECT 5000 and Streptomyces natalensis ATCC 27448 gDNAs were used for KS domain amplification. The AD domain was amplified from gDNA of Nodularia sp. LEGE 06071 and Streptomyces natalensis ATCC 27448. The thermal cycling was performed at Veriti® 96-Well Thermal Cycler (ThermoFisher Scientific). For amplification of KS domain, the PCR reaction was prepared in a volume of 20 μL containing 1× TaKaRA PCR Buffer (TAKARA BIO INC), 1.5 mM MgCl2 (TAKARA BIO INC), 250 μM DNTPS (TAKARA BIO INC), 1 μM of each of the primers (KSF1/KS_v2_Rv - STABVIDA), 3% DMSO, 0.5 U TaKaRa Taq™ Hot Start Version

(33)

(TAKARA BIO INC) and 2 μL of template DNA. The PCR conditions executed were: initial denaturation step at 98 ºC for 2 min, followed by 40 cycles of a denaturation step at 94 ºC for 30 s, annealing at 55 ºC for 30 s and extension at 72 ºC for 22s, followed by a final extension step at 72 ºC for 5 min.

For amplification of AD domain, the reaction was prepared in a volume of 20 μL containing 1× TaKaRA PCR Buffer (TAKARA BIO INC), 1.5 mM MgCl2 (TAKARA BIO INC), 250 μM DNTPS (TAKARA BIO INC), 1 μM of each of the primers, 0.5 U TaKaRa Taq™ Hot Start Version (TAKARA BIO INC) and 2 μL of template DNA. The PCR conditions executed were: initial denaturation step at 98 ºC for 2 min followed by 40 cycles of a denaturation step at 94 ºC for 30 s, annealing at 54 ºC for 30 s and extension at 72 ºC for 22 s, followed by a final extension step at 72 ºC for 5 min.

For validation of the PCR reaction, 5 μL of PCR products were separated by electrophoresis on a 1.5% (w:v) agarose gel during 40 min at 120 V. GRS ladder 100 bp (GriSp) was used (5 μL loaded). Gel was stained with 1 μl SYBR® Safe DNA Gel Stain (ThermoFisher Scientific), visualized under UV-light at Gel Doc XR+ System and analysed with Image Lab™ software.

After validation of the reaction, PCR products (15 µL of PCR product loaded onto each well) were separated by electrophoresis on a 1% (w/v) agarose gel during 60 min at 150 V, stained with 1 μL SYBR® Safe DNA Gel Stain (ThermoFisher Scientific). The bands were visualized under UV-light with Gel Doc XR+ System and excised using sterile scalpels. The bands were purified using the kit NZYGelpure (nzytech) and sequenced by Sanger sequencing at STABVIDA (Portugal). Briefly, the following components were used: igDye ® Terminator v3.1 Cycle Sequencing Kit [Applied Biosystems]; BigDye® Terminator v1.1, v3.1 5 Sequencing Buffer [Applied Biosystems]; primer 10 μM; nuclease-free water (Ambion); purified PCR product. The sequencing products were purified with illustra™ Sephadex™ G-50 Fine DNA Grade and submitted to an automated capillary electrophoresis on ABI 3730xl Genetic Analyzer sequencer (Applied Biosystems). The Visual quality control of the electropherograms was performed in Sequence Scanner v1.0 (Applied Biosystems). Raw forward and reverse sequences (ab1 files) were submitted to Geneious 8.1.9 66_{for de novo assembling. The resulting}

consensus sequences (average length 300 and 240 bp for KS and AD domain, respectively) were submitted to NCBI for a blast(n) search against the nucleotide collection database.

1.5.1 – Phylogenetic and NaPDoS analysis

The obtained sequences were aligned with the sequences used for primer design and submitted to a phylogenetic analysis, to survey the diversity covered. KS domain

(34)

sequences were also classified using the web tool NaPDoS and a phylogenetic tree (using the NaPDoS database as reference) was constructed.

IV – Results

Alignments composed of 63 and 84 sequences with 1462 and 1506 bp for KS and AD domain, respectively, were obtained as a basis for primer design. For each alignment, a phylogenetic analysis was performed to inspect the diversity covered by the selected sequences. For the KS domain, according to the phylogenetic tree (Figure 1) it is possible to observe that the sequences are diverse and encompass all the documented classes of KSs, with a clear clustering pattern linked to function. Likewise, the phylogeny of the AD domain (Figure 2) included the known diversity of ADs and the clustering pattern was congruent with the type of domain.

A series of conserved zones were selected for primer design. In total four forward and two reverse primers were designed for the KS domain, in different regions of the gene (Table S 3). For the AD domain, five forward and four reverse primers were designed. Initially, for KS domain, primer pairs KSF1/KSR1 and KSF2/KSF1 were designed, which, according to the in silico analysis, seemed very robust. For the AD domain, initially primers ADFw1/ADRv1, ADFw2 and AFw2.1/ADRv2 and ADFw3/ADRv3 were designed. However, the PCR amplification originated quite a few non-specific bands in both cases, or no bands at all, even after the attempts to optimize the conditions. A second iteration of primer design was performed, either by designing in new regions or by decreasing the degeneracy of the primers, that yielded the primers designated as v2 (Table S 3).

Several combinations of primer pairs were tested, and the pairs KSF1/KS_v2_Rv and NRPS_v2_Fw/ADRv3 proved to be the more reliable. When comparing PCR amplifications, using these designed primers, albeit with a non-optimized protocol, to the currently used literature primer pairs, is possible to observe a band with the expected bp in almost all the strains tested (Figure 3 and Figure 4), but the literature primers fared better. Amplification of AD domain, gave a good indication that the designed primers are able to recover a broader range of diversity, as product with the expected bp was obtained in cyanobacterial gDNA with designed primers and not with primer pair A3F/A7R tested (Figure 3). Also, amplification with gDNA of Proteabacteria yielded similar results (Supplementary Figure S 1).

(35)

PUFA

FAS Enediyne

Modular

nosB

Herpetosiphon aurantiacus DSM 785 Hybrid KS Domain ituA

jamE elaJ baeM

Herpetosiphon aurantiacus DSM 785 KS1 KS Domain Nitrosomonas europaea ATCC 19718 Iterative KS domain Spirosoma radiotolerans strain DG5A KS Domain Modular Rhodopirellula baltica SH 1 Modular KS Domain

Herpetosiphon aurantiacus DSM 785 Modular KS Domain epoA

pltC

Methylobacterium sp. 4-46 Modular KS Domain hliP

Opitutus terrae PB90-1 Modular KS Domain Opitutus terrae PB90-1 Iterative KS Domain

aviM vinP2

Achromobacter xylosoxidans strain FDAARGOS 147 Modular KS Domain Lysobacter sp ATCC 53042 Trans-AT KS Domain

lnmI

Burkholderia gladioli BSR3 Trans-AT KS Domain Hymenobacter sp. PAMC 26554 Hybrid KS Domain Paenibacillus mucilaginosus K02 Hybrid KS Domain Opitutus terrae PB90-1 Hybrid KS Domain

blmVIII var4

Alcanivorax pacificus W11-5 Hybrid KS Domain Paracoccus denitrificans PD1222 Hybrid KS Domain

mtaD

Singulisphaera acidiphila DSM 18658 KS1 KS Domain Gloeobacter violaceus PCC 7421 DNA Trans-AT KS Domain

tmnAI stiA sorA gulB

dszA

Xanthobacter autotrophicus Py2 Iterative KS Domain Methylobacterium radiotolerans JCM 2831 Enediyne KS Domain Burkholderia cenocepacia strain ST32 Iterative KS Domain Tistrella mobilis KA081020-065 Iterative KS Domain

Singulisphaera acidiphila DSM 18658 Iterative KS Domain Paenibacillus mucilaginosus K02 Iterative KS Domain HSAF

Xanthobacter autotrophicus Py2 PUFA KS Domain Rubrivivax gelatinosus IL144 PUFA KS Domain

Corallococcus coralloides DSM 2259 PUFA KS Domain pfaA Aureispira marina

HglE

PfaA Shewanella violacea DSS12

Elusimicrobium minutum Pei191 PUFA KS Domain Roseiflexus castenholzii DSM 13941 PUFA KS Domain Planctomyces sp. SH-PL62 PUFA KS Domain

Pandoraea oxalativorans strain DSM 23570 Enediyne KS Domain Methylococcus capsulatus strain Bath Enedyine KS Domain

calE8

Haliangium ochraceum DSM 14365 Enediyne KS Domain

Microcystis aeruginosa NIES-843 Enediyne KS Domain Enediyne KS Domain Herpetosiphon aurantiacus DSM 785

jamG

Bacillus velezensis strain CC09 Modular KS Domain FAS Streptomyces sp. 2114.2

FAS Escherichia coli strain FORC 031

99 76 99 99 98 72 97 82 83 99 82 99 74 29 26 43 99 79 97 98 38 48 46 67 47 30 23 40 37 99 80 97 95 34 49 53 96 44 59 52 51 53 32 53 33 26 99 29 23 31 36 20 15 15 0 10 0 0 26 4 7 0.1 Iterative KS1 Hybrid Trans-AT Iterative Modular Modular Hybrid KS1 Trans-AT Trans-AT

Figure 1 – Phylogenetic tree of the KS domain sequences collected for primer design and the respective class to which they belong. The tree was computed in MEGA7 110_{, reconstructed using the Neighbor-Joining}182_{and bootstrap method}

(1000 replications) and englobed 67 nucleotide sequences with 1462 bp. Fatty acid synthase (FAS) sequences from E.coli and Streptomyces sp. were included as outgroup.

(36)

Figure 2 - Phylogenetic tree of the AD domain sequences collected for primer design. The tree was computed in MEGA7 [63], reconstructed using the Neighbor-Joining [135] and bootstrap method (1000 replications) and englobed 82 nucleotide sequences with 1506 bp. Sequences from hybrid PKS-NRPS genes, mtaD, nosB and blmVIII were included as outgroup.

mycB bamB ituC bacB licB bacC bacA tycB licA fenD

Spirosoma radiotolerans strain DG5A(SD10 02305) Elusimicrobium minutum Pei191 AD Domain (Emin 0995)

Pantoea agglomerans(AAO39110.1) vibF

Elusimicrobium minutum Pei191 AD Domain(Emin 1012) dhbF

Filimonas lacunae NBRC 104114 (FLA 1939)

Chryseobacterium gallinarum strain DSM 27622 (OK18 15880) Spirosoma radiotolerans strain DG5A(SD10 09935)

Mucilaginibacter gotjawali (MgSA37 03143) Filimonas lacunae NBRC 104114 (FLA 1304)

Herpetosiphon aurantiacus DSM 785 AD domain (Haur 1574) ndaA crs2 crs1 ablD Winogradskyella sp. PG-2 (WPG 0383) nosA mcnB adpA ociB aptA1 ndaB aptA2 mcnA adpB Flammeovirgaceae bacterium 311 (D770 00005)

Herpetosiphon aurantiacus DSM 785 AD Domain(Haur 1805) Herpetosiphon aurantiacus DSM 785 AD domain(Haur 2091)

Herpetosiphon aurantiacus DSM 785 (Haur 3129) Azotobacter vinelandii CA6 (AVIN RS11710)

Collimonas sp. MPS11E8 (CCT ORF03016) cbsF

Rhizobium leguminosarum Vaf10 AD Domain (BA011 36690) Rhizobium leguminosarum strain Vaf10 (BA011 37190)

massB

Erwinia amylovora ATCC BAA-2158 (EAIL5 3813)

Xanthomonas oryzae pv. oryzicola strain RS105(ACU12 09555) ofaA arfA vlm1 acmB mscH antB mscF

Methylocella silvestris BL2(Msil 0855)

Bradyrhizobium oligotrophicum S58 AD domain (S58 21570) Variovorax paradoxus EPS (Varpa 4519)

orbI

Burkholderia cepacia GG4 AD domain(GI:402247746) Methylobacterium populi BJ001 (Mpop 5163)

Methylobacterium extorquens CM4 (Mchl 5090) depD

Delftia sp. Cs1-4(DelCs14 2100)

Azospirillum thiophilum strain BV-S (AL072 22320) Tistrella mobilis KA081020-065 (TMO c0602) cndF

Streptomyces coelicolor A3 (SCO6431) ncyE qui6 melC cmnA nocB nocA

Hymenobacter sp. PAMC 26554(A0257 16655) chiD

tubC

Myxococcus fulvus 124B02(MFUL124B02 24325) Ralstonia solanacearum (RSp1422)

mtaD nosB blmVIII