• Nenhum resultado encontrado

Structure based drug design for the discovery of promising inhibitors of human Bcl-2 and Streptococcus

N/A
N/A
Protected

Academic year: 2019

Share "Structure based drug design for the discovery of promising inhibitors of human Bcl-2 and Streptococcus"

Copied!
149
0
0

Texto

(1)

João Carlos Moreno Ramos

Licenciado em Bioquímica

Structure based drug design for the

discovery of promising inhibitors of

human Bcl-2 and

Streptococcus

dysgalactiae

LytR proteins

Dissertação para obtenção do Grau de Mestre em Bioquímica

Orientador: Doutora Teresa Sacadura Santos Silva, Investigadora Auxiliar, FCT-NOVA

Co-orientador: Doutor Filipe Miguel dos Santos Freire, Investigador Associado Sénior na Unidade de Biologia para a Descoberta de

Fármacos no IBET/Merck Healthcare Laboratory

Presidente: Prof. Doutor José Ricardo Ramos Franco Tavares Arguente: Doutor Pedro Manuel Henriques Marques Matias

Vogal: Doutora Teresa Sacadura Santos-Silva

(2)
(3)

João Carlos Moreno Ramos

Licenciado em Bioquímica

Structure based drug design for the

discovery of promising inhibitors of

human Bcl-2 and

Streptococcus

dysgalactiae

LytR proteins

Dissertação para obtenção do Grau de Mestre em Bioquímica

Orientador: Doutora Teresa Sacadura Santos Silva, Investigadora Auxiliar, FCT-NOVA

Co-orientador: Doutor Filipe Miguel dos Santos Freire, Investigador Associado Sénior na Unidade de Biologia para a Descoberta de

Fármacos no IBET/Merck Healthcare Laboratory

Presidente: Prof. Doutor José Ricardo Ramos Franco Tavares Arguente: Doutor Pedro Manuel Henriques Marques Matias

Vogal: Doutora Teresa Sacadura Santos-Silva

(4)
(5)

“Structure based drug design for the discovery of promising inhibitors of human Bcl-2 and Strepococcus dysgalactiae LytR proteins”

“Copyright” em nome de João Carlos Moreno Ramos, da FCT/UNL e da UNL

(6)
(7)

Agradecimentos

Dedico esta dissertação aos meus pais, Carlos e Helena, aos meus irmãos, Pedro e Sara, e aos meus avós, Zé, Rosa e Neno. Obrigado por tudo o que fizeram e continuam a fazer por mim. Devo-vos muito do que tenho e principalmente do que sou.

Agradeço à Professora Maria João Romão por me ter dado a oportunidade de trabalhar, tanto durante a Licenciatura como no Mestrado, no grupo de Cristalografia de Macromoléculas (Xtal).

À Dr. Ana Luísa e ao Professor Ricardo Franco por me terem introduzido ao trabalho de investigação e ao método de Cristalografia de raios-X, o qual tenciono manter como um dos meus principais interesses científicos no futuro.

A todo o grupo Xtal: Teresa, Ana Luísa, Angelina, Catarina, Benedita, Marino, Márcia, Muthukumaran, Cristiano, Rita, Francisco, Viviana, Diana, Raquel, Filipa, Sónia e aos meus colegas de mestrado Frederico e Diogo por todo o apoio e pelo excelente ambiente que, certamente, ajudou-me a ultrapassar as dificuldades que surgiram ao longo deste ano.

À Dr. Teresa Santos Silva pela sua orientação durante este ano, que implicou muito trabalho, exigência e imensas oportunidades de enriquecimento académico e pessoal.

Ao Dr. Filipe Freire pela incessante paciência e disponibilidade durante a minha aprendizagem da parte experimental desta dissertação. O principal motivo de interesse que me direcionou para este trabalho foi a possibilidade de aprender sobre várias técnicas relacionadas com a cristalografia e a concepção de fármacos e, sem dúvida, não podia ter melhor professor. Continuarás a ser um exemplo de profissionalismo e organização que espero seguir no futuro.

To Dr. Jayaraman Muthukumaran for is remarkable patience and kindness throughout my learning process regarding the bioinformatic and computational approach in this work. You taught so much and I hope to keep learning more about all the techniques that we used and are common in drug design.

(8)

Ao Francisco Leisico por todo o apoio durante o trabalho experimental e pelas bem-humoradas convivências perante a bancada do laboratório. Fico-te a dever incontáveis futeboladas e tu a dever-me umas quantas aulas de surf.

À Dr. Márcia Correia pelo extraordinário trabalho na gestão do laboratório e pela paciência com a qual enfrentou todas as minhas dúvidas sobre reagentes, equipamentos e software.

Aos alunos com os quais trabalhei, João Ferreira, Kivénia Ferreira e Sara Carvalho, e aos quais tentei ensinar parte do meu trabalho e das técnicas que utilizei. Espero que tenha conseguido passar algum conhecimento que vos ajude no futuro, independentemente de todas as minhas falhas.

Aos meus amigos, Rafael Rippel, Luis Silva, Diogo Gonçalves, Alexandra Francisco, António Lopéz, Mauro Duarte, Miguel Correia, Dário Valezim, João Veléz, Bruno Guerreiro, Miguel Palhas e Emanuel Moreira, pelos fantásticos tempos que passámos juntos e todas as recordações que levarei comigo. Sem vocês, a faculdade ter-me-ia dado muito pouco.

(9)

Resumo

A descoberta de fármacos tem evoluído nas últimas décadas no sentido da concepção racional de moléculas activas. A possibilidade do estudo de interacções moleculares ao nível atómico e a compreensão de que este conhecimento pode ser aplicado ao desenvolvimento de fármacos são as premissas da concepção estrutural de fármacos. Associando esta abordagem aos métodos computacionais disponíveis actualmente, é concretizada a oportunidade de acelerar o complexo processo de concepção de fármacos. Na presente tese, esta metodologia é empregue no estudo de promissores candidatos a inibidores das proteínas Bcl-2 humana e LytR de Streptococcus dysgalactiae.

Estima-se que metade dos cancros em humanos estão relacionados com a sobre-expressão de Bcl-2. Esta proteína é responsável pela inibição do mecanismo apoptótico, o qual é essencial para a irradicação de células disfuncionais. Quando ocorre sobre-expressão de Bcl-2, estas células não respondem a estímulos apoptóticos, quer endógenos ou exógenos, como agentes quimioterapêuticos. Derivados das famílias 4H-cromeno e indol foram estudados em relação à sua capacidade promissora de inibir a Bcl-2. Estudos de docking molecular revelaram afinidades sub-micromolares dos compostos 4H-cromeno activemethine e indol para o local de ligação fisiológico da Bcl-2. A caracterização biofísica das interacções não resultou em provas evidentes de ligação, provavelmente devido à escassa rede de interacções com os resíduos do local de ligação. Cristalografia de raios-X foi direcionada para a determinação da estrutura destes possíveis complexos proteína-ligando, sendo que condições preliminares de co-cristalização foram alcançadas.

Diversas patologias infecciosas estão relacionadas com o fenótipo de biofilme, o qual consiste na aglomeração de bactérias envoltas numa matriz. O biofilme confere às bactérias um aumento em relação à resistência ao sistema imunitário inato do hospedeiro e ao tratamento com antibióticos genéricos. A LytR pertence à família de proteínas LCP, as quais se pensa estarem envolvidas na adição de polímeros aniónicos ao peptidoglicano, protegendo as bactérias Gram-positivas de fagocitose e da lise celular. Estudos anteriores de procura bioinformática reportaram o ácido elágico e a fisetina como inibidores promissores da LytR, demonstrando actividade anti-biofilme. O docking molecular revelou a ligação destes compostos com afinidade micromolar ao hipotético sítio activo da LytR, interagindo com resíduos catalíticos fundamentais. As técnicas biofísicas empregues não comprovaram ligação destas moléculas à proteína, o que pode ser explicado pela co-purificação de um substrato lipídico, fenómeno que já foi reportado anteriormente. A aplicação de espectrometria de massa ou a determinação estrutural, através de cristalografia de raios-X ou RMN, pode ser conclusiva relativamente à ocupação do sítio activo da LytR por parte desta molécula.

(10)
(11)

Abstract

Drug research has evolved significantly in the last decades toward the concept of the rational design of drugs. The capability to study molecular interactions at the atomic level and to rationalize this knowledge to construct and improve drug candidates provided the premises of structure-based drug design (SBDD). This approach allied to the computational methods available nowadays yields the opportunity to expedite the intricate process of drug discovery. In the present thesis, the SBDD approach was implemented to study promising candidate inhibitors of the human Bcl-2 and the Streptococcus dysgalactiae LytR proteins.

Half of the cancers in humans are estimated to be related with overexpression of Bcl-2 protein. This macromolecule is responsible for the inhibition of the apoptotic process, which is pivotal for the elimination of abnormal cells. When Bcl-2 is overexpressed, these abnormal cells don’t respond to death stimuli, either endogenous or exogenous, such as chemotherapeutic, and become immortal. Promising 4H-chromene and indole derivatives were studied regarding their potential to inhibit Bcl-2. Molecular docking studies revealed sub-micromolar binding of the 4H-chromene activemethine and the indole derivatives in the binding groove essential for Bcl-2 biological function. Biophysical characterization did not demonstrate significant evidence of binding between Bcl-2 and the compounds under study, probably due to their small network of interactions with the binding pocket residues. The structure determination process of the protein-ligand complexes achieved preliminary co-crystallization conditions that require further optimization.

Numerous infectious diseases are associated to the bacterial biofilm phenotype, which consists of agglomerates of cells enclosed in a self-produced matrix. Biofilms confer bacteria improved resistance to the host’s innate immune system and to conventional antibiotics. LytR belongs to the LCP family of proteins, which are thought to be responsible for the attachment of anionic polymers to the peptidoglycan, protecting the Gram-positive bacteria from phagocytosis and lysis. Previous virtual screening studies yielded ellagic acid and fisetin has promising inhibitors of LytR, displaying anti-biofilm activity. Molecular docking revealed binding of these compounds in the hypothetical active site of LytR, with micromolar affinities, and specific interactions with crucial protein residues for catalysis. Biophysical techniques failed to provide evidence of protein-ligand interactions, although this may be related to the possible co-purification with a lipidic substrate, which has been reported before. Mass spectrometry or structural determination, through X-ray crystallography or NMR, should be pivotal to establish evidence of this molecule’s accommodation in the binding pocket.

(12)
(13)

Table of contents

Agradecimentos ... I

Resumo ... III

Abstract ... V

Table of contents ... VII

List of figures ... XI

1.

Introduction ... 1

1.1.

Drug Discovery ... 3

1.1.1.

Historical Perspective ... 3

1.1.2.

Overview ... 5

1.1.2.1. Target Identification ... 5

1.1.2.2. Target Validation ... 6

1.1.2.3. Hit Discovery Process ... 6

1.1.2.4. Hit series selection ... 8

1.1.2.5. Hit-to-lead phase ... 9

1.1.2.6. Lead optimization phase ... 10

1.2.

Structure-based Virtual Screening (SBVS) ... 11

1.2.1.

Target structural determination ... 11

1.2.1.1. X-ray crystallography ... 11

1.2.1.1.1. Protein crystallization ... 13

1.2.1.1.2. X-ray diffraction ... 15

1.2.1.1.3. Structure determination ... 17

1.2.1.2. Homology Modeling ... 19

1.2.2.

Virtual Screening ... 21

1.2.2.1. Ligand-based Virtual Screening ... 21

1.2.2.2. Structure-based Virtual Screening ... 23

1.2.3.

Target-ligand complex characterization ... 24

1.2.3.1. Thermal Shift Assay (TSA) ... 24

1.2.3.2. Urea-Polyacrylamide Gel Electrophoresis ... 25

1.2.3.3. Isothermal Titration Calorimetry (ITC) ... 26

1.2.3.4. Circular Dichroism (CD) ... 27

1.2.4.

Target-ligand structural determination ... 28

1.2.4.1. X-ray crystallography of protein-ligand complexes ... 28

1.2.4.2. Saturation Transfer Difference – NMR (STD-NMR) ... 29

1.2.4.3. Molecular Dynamics (MD) ... 31

(14)

1.3.1.

Human Bcl-2 protein ... 33

1.3.2.

Streptococcus dysgalactiae

LytR ... 39

2.

Materials and Methods ... 43

2.1.

Target structural determination ... 45

2.1.1.

Recombinant protein expression and purification ... 45

2.1.1.1. Human Bcl-2 protein ... 45

2.1.1.2. Streptococcus dysgalactiae LytR ... 46

2.1.2.

X-ray crystallography ... 47

2.1.2.1. Protein crystallization ... 47

2.1.2.1.1. Human Bcl-2 protein ... 47

2.1.2.1.2. Streptococcus dysgalactiae LytR ... 48

2.1.2.2. X-ray diffraction ... 49

2.1.3.

Homology Modeling ... 49

2.1.3.1. Human Bcl-2 protein ... 49

2.1.3.2. Streptococcus dysgalactiae LytR ... 49

2.2.

Virtual Screening and Molecular Docking ... 50

2.2.1.

Ligand-based ... 50

2.2.1.1. Human Bcl-2 protein ... 50

2.2.2.

Structure-based ... 50

2.2.2.1. Human Bcl-2 protein ... 50

2.2.3.

Molecular Docking ... 50

2.2.3.1. Human Bcl-2 protein ... 50

2.2.3.2. Streptococcus dysgalactiae LytR ... 51

2.3.

Target-ligand complex characterization ... 51

2.3.1.

Thermal Shift Assay (TSA) ... 51

2.3.2.

Urea-Polyacrylamide Gel Electrophoresis ... 52

2.3.2.1. Human Bcl-2 protein ... 52

2.3.3.

Isothermal Titration Calorimetry (ITC) ... 52

2.3.3.1. Human Bcl-2 protein ... 52

2.3.3.2. Streptococcus dysgalactiae LytR ... 53

2.3.4.

Circular Dichroism (CD) ... 53

2.3.4.1. Streptococcus dysgalactiae LytR ... 53

2.4.

Target-ligand structural determination ... 53

2.4.1.

X-ray crystallography ... 53

2.4.1.1. Human Bcl-2 protein ... 53

(15)

2.4.3.

Molecular Dynamics (MD) ... 55

2.4.3.1. Bcl-2 protein-ligand complexes ... 55

2.4.3.2. Structural impact of non-synonymous single nucleotide polymorphisms (nsSNP) in Human Bcl-2 protein ... 56

3.

Results and Discussion ... 59

3.1. Target structural determination ... 61

3.1.1. Recombinant protein purification ... 61

3.1.1.1. Human Bcl-2 protein ... 61

3.1.1.2. Streptococcus dysgalactiae LytR ... 62

3.1.2. X-ray crystallography ... 64

3.1.2.1. Human Bcl-2 protein ... 64

3.1.2.2. Streptococcus dysgalactiae LytR ... 64

3.1.3. Homology Modeling ... 67

3.1.3.1. Human Bcl-2 protein ... 67

3.1.3.2. Streptococcus dysgalactiae LytR ... 68

3.2.

Virtual Screening and Molecular Docking ... 69

3.2.1. Human Bcl-2 protein ... 69

3.2.3.

Streptococcus dysgalactiae

LytR ... 80

3.3. Target-ligand complex characterization ... 83

3.3.1. Thermal Shift Assay (TSA) ... 83

3.3.1.1. Human Bcl-2 protein ... 83

3.3.1.2. Streptococcus dysgalactiae LytR ... 86

3.3.2. Urea-Polyacrylamide Gel Electrophoresis ... 87

3.3.2.1. Human Bcl-2 protein ... 87

3.3.3. Isothermal Titration Calorimetry (ITC) ... 87

3.3.3.1. Human Bcl-2 protein ... 88

3.3.3.2. Streptococcus dysgalactiae LytR ... 88

3.3.4. Circular Dichroism (CD) ... 89

3.3.4.1. Streptococcus dysgalactiae LytR ... 89

3.4. Target-ligand structural determination ... 90

3.4.1. X-ray crystallography ... 90

3.4.1.1. Human Bcl-2 protein ... 90

3.4.1.2. Streptococcus dysgalactiae LytR ... 91

3.4.2. Saturation Transfer Difference (STD-NMR) ... 92

3.4.2.1. Human Bcl-2 protein ... 92

(16)

3.4.3. Molecular Dynamics (MD) ... 94

3.4.3.1. Protein-ligand complexes of Human Bcl-2 ... 94

3.4.3.2. Structural and functional impact of non-synonymous single nucleotide polymorphysms (nsSNP) on Human Bcl-2 protein ... 96

4. Conclusions and future perspectives ... 99

5. Bibliography ... 105

6. Appendix ... 117

6.1. TSA buffer screening ... 119

6.2. TSA additive screening ... 120

6.3. TSA results of buffer and additive screenings with Bcl-2 ... 121

(17)

List of figures

Figure 1.1 - Chemical structures of the first two active compounds isolated from medicinal plants,

morphine and papaverine. ... 3

Figure 1.2 - Schematic representation of a drug discovery process. ... 5

Figure 1.3 - Schematic representation of the properties assessed during a hit series selection. 9 Figure 1.4 - Schematic representation of the X-ray crystallography workflow. ... 12

Figure 1.5 - Representation of a (triclinic) unit cell with its defining parameters. ... 13

Figure 1.6 - Schematic representation of the vapor diffusion method. ... 14

Figure 1.7 - Representation of the Phase Diagram used to guide a protein crystallization endeaviour. ... 14

Figure 1.8 - Schematic representation of the X-ray diffraction experiment. ... 15

Figure 1.9 – Schematic representation of a synchrotron. ... 16

Figure 1.10 - Schematic representation of the Bragg's law. ... 16

Figure 1.11 - Schematic representation of the normal mode protocol followed by Phyre2. ... 19

Figure 1.12 - Schematic representation of the intensive mode protocol used by Phyre2. ... 21

Figure 1.13 - Example of bit strings generated for two molecules using a small set of substructures. ... 22

Figure 1.14 - Representation of a protein unfolding curve from a TSA. ... 24

Figure 1.15 - Schematic representation of an ITC experiment. ... 27

Figure 1.16 - Far U.V. CD spectra characteristic for various secondary structures. ... 27

Figure 1.17 - Schematic representation of the STD-NMR technique. ... 30

Figure 1.18 - Energy equation applied to calculate the atom's motion in a Molecular Dynamics simulation. ... 31

Figure 1.19 - Schematic diagram of the six hallmarks of cancer. ... 33

Figure 1.20 - Schematic representation of the sequence homology present in Bcl-2 family members. ... 34

Figure 1.21 - Schematic representation of the embedded together model. ... 35

Figure 1.22 - Crystal structures of human Bcl-2 protein and Bcl-2 with Bax’s BH3 domain. ... 36

Figure 1.23 - Chemical structures of the three generations of arylsulfonamides' most representative examples. ... 37

Figure 1.24 - Chemical structure of the reference 4H-chromene molecule, HA14-1. …………. 38

Figure 1.25 - Chemical structures of the 4H-chromene derivatives under study. ………. 38

Figure 1.26 - Chemical structure of the indole derivative under study. ... 38

Figure 1.27 - Model of the development of mature biofilm from planktonic cells. ……...……… 39

Figure 1.28 - Schematic representation of the attachment of wall teichoic acids to the peptidoglycan cell wall by LCP proteins. ………. 40

Figure 1.29 - Crystal structure of the extracellular domains of Cps2A from serotype 2 S. pneumoniae D39. ………...……… 41

(18)

Figure 1.31 - Schematic representation of the sequence homology between the proteins of the LCP family. ………... 42 Figure 1.32 - Chemical structures of ellagic acid and fisetin. ……….………. 42 Figure 3.1 - Bcl-2 IMAC chromatogram with corresponding fractions noted on the SDS-PAGE gel. ………. 61 Figure 3.2 - Bcl-2 SEC chromatogram with corresponding peaks noted on SDS-PAGE gel. ... 62 Figure 3.3 - LytR IMAC chromatogram with corresponding peaks noted on SDS-PAGE gel. … 62 Figure 3.4 - LytR SEC chromatogram for the buffer exchange. ………. 63 Figure 3.5 - LytR SEC chromatogram with corresponding peaks noted on SDS-PAGE gel. … 63 Figure 3.6 - Crystals of multiple nature obtained in a derived condition with ammonium sulphate and glycerol as precipitant agents. ……….. 65 Figure 3.7 – LytR crystals obtained from a condition with 50% MPD as precipitant agent. ..… 65 Figure 3.8 – LytR crystals obtained from two different conditions. ……… 66 Figure 3.9 - Bcl-2 full-length model predicted by the Phyre2 server. ………..……… 67 Figure 3.10 - Three-dimensional structure of LytR LCP domain, predicted by homology modeling. ……… 68 Figure 3.11 - Representation of each ligand best pose from the molecular docking results of Bcl-2 with the candidate inhibitors. ……….……… 70 Figure 3.12 - Protein-ligand interaction mapping of the docking results for venetoclax, ethoxy and activemethine derivatives. ………. 71 Figure 3.13 - Best pose docked for the indole 15 molecule to Bcl-2. ………...… 75 Figure 3.14 - Representation of the best pose from each ligand on the LytR modeled structure. ……… 81 Figure 3.15 - Mapping of the LytR residue network interacting with the substrate, decaprenyl- phosphate. ……… 82 Figure 3.16 - Mapping of protein-ligand interactions of LytR with ellagic acid and fisetin. ……. 82 Figure 3.17 - TSA results of Bcl-2 incubated with the 4H- chromene derivatives. ……….. 84 Figure 3.18 - TSA of Bcl-2 in different percentages of DMSO. ……… 84 Figure 3.19 - TSA of Bcl-2 incubated with venetoclax and the indole derivative in 2% DMSO. . 85 Figure 3.20 - TSA results of Bcl-2 with venetoclax and the indole derivative. ………. 85 Figure 3.21 - TSA results of LytR incubated with ellagic acid and fisetin. ……… 86 Figure 3.22 - TSA results of LytR incubated with ellagic acid and fisetin in 2% DMSO. ……… 86 Figure 3.23 - Urea gel of unbounded Bcl-2 and upon incubation with the ethoxy, activemethine and indole derivatives. ……… 87 Figure 3.24 - Urea gel of unbounded Bcl-2 and upon incubation with the indole derivative and venetoclax. ……….……….. 87 Figure 3.25 - Results of the ITC experiments of Bcl-2 with the indole derivative and venetoclax. ……… 88

Figure 3.26 - ITC experiment of LytR with fisetin. ……….……… 89

(19)

Figure 3.28 - Co-crystal of Bcl-2 with the indole derivative. ……… 91

Figure 3.29 - Co-crystal of Bcl-2 with venetoclax. ………. 91

Figure 3.30 - 1H and STD NMR spectra of Bcl-2 incubated with the indole derivative. ……….. 93

Figure 3.31 - 1H and STD NMR spectra of LytR incubated with ellagic acid. ……… 93

Figure 3.32 - 1H and STD NMR spectra of LytR incubated with fisetin. ……… 94

Figure 3.33 - Mapping of the Bcl-2 residue occupancy probability of venetoclax, ethoxy and indole derivatives, from MD simulations. ……… 94

Figure 3.34 - Rmsd and essential dynamics analysis of the MD simulations of native form Bcl-2 and in the presence of the ligands under study. ……… 95

Figure 6.1 - 1H NMR spectrum of the indole derivative. ……….... 124

Figure 6.2 - 1H NMR spectrum of ellagic acid. ... 124

(20)
(21)

List of tables

(22)
(23)

List of abbreviations

a - Wave phase

l - Wavelength

DG – Gibbs free energy

ADME – Absorption, Distribution, Metabolism, Excretion AP – Anionic polymers

Bcl-2 – B-cell lymphoma 2 CCD – Cyclic coordinate descent CD – Circular dichroism

CYP450 - Cytochrome P450 Da - Dalton

HTS – High throughput screening HMM – Hidden Markov models ITC – Isothermal titration calorimetry KD – Dissociation constant

LCP – LytR-Cps2A-Psr

MAD – Multiple wavelength anomalous dispersion MD – Molecular dynamics

MIR – Multiple isomorphous replacement MOM – Mitochondrial outer membrane MR – Molecular replacement

MW – Molecular weight

NMR – Nuclear magnetic resonance

nsSNP – Non-synonymous single nucleotide polymorphism OD – Optical density

PCD – Programmed cell death PDB – Protein data bank PG - Peptidoglycan

RMSD – Root-mean-square deviation

SAD – Single wavelength anomalous dispersion SAXS – Small angle X-ray scattering

SBDD – Structure-based drug design SBVS - Structure-based virtual screening SPR – Surface plasmon resonance STD – Saturation transfer difference Tm – Melting temperature

(24)
(25)
(26)
(27)

1.1. Drug Discovery

1.1.1. Historical Perspective

Since the 19th century, drug research has gained exponential interest and resourcefulness, allowing numerous triumphs in understanding and treating several diseases. At that time, chemistry had become more insightful and its principles started to be applied to different fields, specially pharmacology1. By the 1890s, the foundations of chemistry, such as Avogadro’s atomic hypothesis, the periodic table of elements, the theory of acids and bases by Arrhenius and, also, Kekulés’ theory on the structure of aromatic compounds, had been settled2,3.

In medicine, the studies on the selective affinity of dyes for biological tissues led by Paul Ehrlich were a fundamental step, since they allowed to postulate the existence of “chemoreceptors”4. Later, Ehrlich proposed that a therapeutic advantage could be gained from the hypothetical differences between analogous chemoreceptors of parasites, microorganisms and cancer cells in comparison with host tissues.

Analytical chemistry also played an important role in drug research, since, during the 19th century, several active compounds from medicinal plants were isolated and purified. These were the cases of for example morphine and papaverine, in 1815 and 1848, respectively2,5 (Figure 1.1).

In the case of endogenous bioactive ligands, namely steroid hormones, they were identified even before their biomolecular target was established, isolated and structurally characterized6. However, their biological relevance could often be inferred, despite the lack of understanding of their underlying physiological mechanisms6. To overcome this trend, biochemistry was of pivotal importance, firstly, by proposing enzymes and receptors as good drug targets2, and later, by enlightening and connecting the pathways and mechanisms of these biomolecular receptors6. In the beginning of the 20th century, discoveries concerning steroids and their impact in numerous physiological processes gave rise to several Nobel Prizes, through the 1920s and 1930s. In 1928, Adolf Windaus was rewarded for his research into the constitution of

Figure 1.1 - Chemical structures of the first two active compounds isolated from medicinal plants, morphine

(28)

sterols and their connection with vitamins. Paul Karrer’s work on carotenoids, flavins and on vitamins A and B2 was acknowledge in 1937. In the year after, Richard Kuhn received the Noble Prize also for his discoveries regarding carotenoids and vitamins. Finally, in 1939, Adolf Butenandt was recognized for his work on sex hormones.

During the 20th century, X-ray crystallography emerged as a ground-breaking technique to study the three-dimensional structure of macromolecules, namely proteins. The realization that this knowledge could bring important light to drug research came in the 1980s and 1990s7. The rational design of potential drug candidates using the structure of their target proteins became a reachable goal. In 1990, the first examples where the use of this methodology was essential to yield successful inhibitors of HIV-1 protease were reported8,9. Structure-based drug design (SBDD) became, therefore, a crucial step in the pharmaceutical industry drug discovery programs and one of the main research focuses in academic laboratories7.

In the past decades, two approaches emerged for the discovery of novel drug candidates, among the plethora of small molecules available, for a specific biomolecular receptor. As a predominantly empirical method, high throughput screening (HTS) promised to deliver new interacting compounds that demonstrate in vitro activity for the studied target. Through biophysical techniques, large libraries are screened giving rise to a few promising hits, that may require further optimization10. Meanwhile, advances in computer engineering and computational methods allowed the intervention of in silico techniques in drug discovery. Virtual screening comprises the application of computational methods, as quantum chemistry, molecular docking and molecular dynamics (MD), to screen extensive libraries, delivering novel drug candidates more quickly and with fewer costs6.

(29)

1.1.2. Overview

Drug discovery is driven by the need to find a therapeutic solution for a specific pathology. The initial step of this endeavour is often conducted by academia10, which attempts to develop a hypothesis that the inhibition or activation of a protein or pathway may result in an effective treatment of the disease state. The result of this effort is the identification and selection of a biological target which may require further validation prior to progression to the lead discovery phase10. The drug discovery process can be described by the combination of multiple phases: target identification, target validation, hit discovery process, hit series selection, hit-to-lead phase and lead optimization phase10 (Figure 1.2).

1.1.2.1. Target Identification

The first step in drug discovery is target identification, which entails the selection and characterization of a macromolecule involved in the disease phenotype that either through inhibition or activation can generate the desired therapeutic response. Besides proteins, such as enzymes or receptors, genes and RNA may also be considered as targets10. A crucial parameter that defines a good therapeutic target is its “druggability”. A “druggable” target must be accessible to the drug molecule and provide a biological response that can be measured in both in vitro and in vivo assays10.

There are several approaches that may lead to successful target identification. A phenotypic screening can be performed to conclude, for example, which protein is associated to a specific disease state10. This can be the case of a protein which, upon overexpression, causes a pathology. Its identification is possible through techniques such as mass spectrometry11. Another powerful approach is through genomics studies, which may indicate that a disease state

Target identification

Target validation

Hit discovery process

Hit series selection

Hit-to-lead phase

Lead optimization

phase

(30)

is caused by a genetic modification, as polymorphisms or translocations10. These alterations can jeopardise the protein’s stability and function, and also promote or silence a protein expression. A concerted method is possible through computational efforts, namely data mining, which allows the integration of relevant biomedical data from publications, patents, genomics, proteomics, transgenic phenotyping and compound profiling to identify potential disease targets10,12.

1.1.2.2. Target Validation

Once the target has been identified, its implications on the disease phenotype must be validated through in vitro techniques. One of the methodologies to validate a target is using antisense technology, in which RNA-like oligonucleotides are designed to be complementary to the mRNA precursor of the target10. Hence, the target’s translation is supressed by blocking the translation machinery. One major advantage of this technique is its reversibility when compared to the gene knockout approach10. The knockout approach is based on in vivo experiments, where transgenic animals that don’t possess the gene of interest are used. A similar approach is called knock-in, where a non-enzymatically functioning protein replaces the endogenous target13. This approach allows the observation of the animal’s response to an effective treatment, since the protein is expressed but functionally inhibited. The need to make tissue restricted and inducible knockouts led to the application of small-interference RNA (siRNA)10,14. The process starts with the injection of double-stranded RNA (dsRNA) which is recognized by the cell as exogenous genetic material, activating the RNAi pathway. After dsRNA cleavage into small fragments, called siRNA, they are separated into single strands and a RNA-induced silencing complex is associated. This leads to recognition and cleavage of the endogenous target’s mRNA, preventing its translation. A different approach to target validation is the use of monoclonal antibodies, as they can interact directly with the target10. Their specificity is a major advantage because they may discriminate between very closely related proteins, recognizing unique epitopes.

1.1.2.3. Hit Discovery Process

The hit discovery process relies on the detection of a small set of compounds that demonstrate effective interactions with the target under study. These “hits” are found through screening assays of large libraries of small molecules. Nowadays, numerous screening strategies are adopted10.

(31)

Fragment screening focuses on building libraries of molecules with lower molecular weight, increased polarity and with enhanced solubility regarding drug-like compounds. These small molecules are screened at high concentrations against the target of interest and the successful hits allow the identification of building blocks capable of forming an inhibitor candidate18. The structure determination of the complexes between the target molecule and the hits, through X-ray crystallography or nuclear magnetic resonance (NMR), is valuable to enable compound progression10.

Physiological screening is an in vitro approach that screens compounds in tissue-based assays aiming to find hits that provide a response similar to the in vivo desired effect10. As in the cell-based assays in HTS, this methodology requires secondary experiments to determine target specificity.

Focused or knowledge-based screening comprises filtering large libraries to provide restricted small molecule subsets with chemical properties known to have activity toward the target molecule19. This knowledge is gathered from the literature and gave rise to the computational approach of virtual screening10. The search for common antagonists of the MDM2/4-p53 systems is a great example of the application of the knowledge-based screening approach20. In this case, MDM2 and MDM4 are homologue proteins which inhibit the tumor-supressing protein p53 by different mechanisms. Thus, combining inhibition of both homologues would activate p53 more significantly than only antagonize one of them. Through X-ray crystallography and NMR, structural information was obtained regarding the MDM2-p53 complex interactions and also between MDM2 and ligands with nanomolar potency. This knowledge led to the realization that fundamental interacting residues were conserved, at least in their hydrophobic nature, between MDM2 and MDM4. This was the premise for the identification of novel selective and dual MDM4 and MDM2 inhibitors, through knowledge-based screening, using the MDM2-p53 system as reference.

Virtual screening is an in silico method that allows the screening of vast small molecule databases more quickly and cost-efficiently, filtering the promising candidates that should be tested in the laboratory6,7,21. Through molecular docking and scoring algorithms, the screened compounds are ranked according to the predicted affinity to the target molecule6,21. To do so, the target’s three-dimensional structure is required, which may be provided by experimental methods, as X-ray crystallography and NMR, or by computational methods, such as homology modeling and ab initio calculations7,10,21.

All the above-mentioned methodologies require biophysical and biochemical techniques for hit detection. These experiments may be applied to the individual target, cell cultures, or even grown tissues.

(32)

appropriate probes to detect compound activity toward the target of interest would greatly benefit cell-based techniques’ implementation22.

Assays concerning only the individual target may provide potential chemotypes with simple structure-activity relationship and mechanism of action, resulting in the best approach to the hit discovery and hit-to-lead optimization phases22. Further cell-based studies would contribute with cell penetration, activity and stability evaluations22. However, in the case of an overwhelming number of hits emerging in the hit discovery phase screenings, the cell-based approach becomes a suitable alternative to preselect a subset of compounds with presumed specific cellular activity22.

1.1.2.4. Hit series selection

To avoid an exaggerated number of hits resulting from an active compound screening, which would diminish the chances of reaching an effectively promising hit, a triaging process must be performed. The first measure to be taken is the removal of compounds which emerge frequently as hits in various assays from the hit series library10. Another important approach is to use computational chemistry algorithms developed to group compounds according to their structural similarity10. Through this filtering, promiscuous compounds will be excluded from the hit series, which will possess a broad spectrum of chemical classes to be tested.

Several aspects impact the hit series selection in order to provide promising hits during the drug discovery process. These properties are associated with drug-likeness, toxicity and pharmacokinetics (ADME) (Figure 1.3).

The compounds present in a small molecule library are usually in accordance with some criteria that confers them drug-likeability properties, such as the Lipinski rule of five23. This combination of characteristics used to select compounds arises from the statistical realization that most successful commercially available drugs can be described by a few parameters10. In the case of the Lipinski rule of five, these conditions are: molecular weight lower than 500 Da, cLogP (measure of lipophobicity which affects pharmacokinetics) less than 5, no more than 5 hydrogen bond donor atoms and less than 10 hydrogen bond acceptor atoms23.

In many cases, reversibility is of great interest regarding the interactions between the hit and its target, ensuring that the drug is metabolized and excreted from the patient’s body10. To assess this property it’s pivotal to perform dose-responsive curves in the primary hit discovery assays10. Secondary assays may be focused on examining the surviving hits in a cell-based approach, which would provide a functional response to the compound10.

Chemical synthesis has also an important role in defining a hit series, since criteria like synthetic route, derivatization potential and amenability to parallel synthesis, are looked for, in a hit-to-lead optimization phase10.

(33)

1.1.2.5 Hit-to-lead phase

The purpose of the hit-to-lead phase is to refine the promising hit series detected previously. This refinement is meant to improve compound potency and selectivity besides its pharmacodynamic and pharmacokinetic properties10. The latter can be assessed through cell-based techniques, using in vivo models, while the target-hit relationship is studied by biophysical and biochemical assays.

To establish the activity relationship between the hits and the target, a structure-based drug design approach is usually implemented10. This entails structural characterization through experimental techniques such as X-ray crystallography and NMR, namely saturation transfer difference NMR (STD-NMR), but can also imply in silico methods like molecular dynamics (MD)10,21,24. This methodology provides details at the atomic level, which are pivotal to describe the mechanism of interaction between a hit and its target.

Compound potency and selectivity investigation can be achieved by several biophysical and biochemical techniques that focus on different aspects of the protein-ligand binding phenomenon. Some examples of such techniques are: thermal shift assay (TSA), urea-polyacrylamide gel electrophoresis, isothermal titration calorimetry (ITC), circular dichroism (CD), STD-NMR, small angle X-ray scattering (SAXS), surface plasmon resonance (SPR), UV/Visible spectroscopy, and fluorescence polarization/anisotropy24–31.

In the case of TSA, the outcome of the protein-ligand interaction probed is the increase in protein stability upon ligand binding, which is reflected in the protein’s augmented resistance to thermal denaturation27. Regarding techniques such as urea-polyacrylamide gel electrophoresis, CD and SAXS the biophysical property observed is the structural difference between the native protein and the protein-ligand complex24,30,31. SPR , UV/Visible spectroscopy,

Hit series

ADME

Toxicity

Drug-likeness

(34)

and fluorescence polarization/anisotropy binding assays are based on the premise that a protein-ligand complex will change the electronic properties of the protein, promoting variations upon interaction with light24.

Other considerations must be taken in the hit-to-lead phase regarding the pharmacodynamic and pharmacokinetic processes. Solubility and permeability of a compound is evaluated in order to devise the appropriate delivery strategy to its target, either by injection or oral uptake10. Cytochrome P450 (CYP450) inhibition is also investigated, since it is a pivotal element of the patient’s metabolism, which may suffer from undesired drug interference10. Other metabolizing enzymes such as aldehyde oxidase are becoming more popular among medicinal chemists, due to their ability to react with scaffolds that were originally devised to circumvent CYP450 activity32,33.

1.1.2.6. Lead optimization phase

The aim of this final stage of the drug discovery process is the improvement of the leading compound’s undesirable characteristics while maintaining its promising properties10.

Optimization may be focused on several issues such as compound potency, selectivity, stability or solubility. After surpassing these hurdles the leading candidate may move to a preclinical stage, while other assessments continue being made. Genotoxicity models, such as the Ames test34, are commonly used to examine the lead besides in vivo behaviour evaluation through, for example, the Irwin’s test35.

(35)

1.2. Structure-based Virtual Screening (SBVS)

The pharmaceutical industry approach to drug discovery is primarily focused on HTS, which comprises several drawbacks such as high-cost, time-demand and mechanistic uncertainty6,21. Academia’s research is usually directed toward SBDD and computational methods to avoid those issues and to allow a more rational and iterative design of drug molecules.

Virtual screening, as an alternative to HTS, provides hits for a target from commercially available compound libraries, prior to experimental testing, in a more quick and cost-effective way21. These compounds are selected by their predicted affinity to the target of interest and progress to the hit-to-lead phase by means of SBDD. SBVS is based on the search for active compounds of a structural characterized target, through a computational approach. It comprises several stages that may be identified as: target structural determination, virtual screening, target-ligand complex characterization and target-target-ligand structural determination.

1.2.1. Target structural determination

The first step of a SBVS strategy entails the determination of the three-dimensional structure of the target of interest. This endeavour is predominantly achieved through X-ray crystallography, although NMR and computational techniques, such as homology modeling and ab initio methods, may also contribute to this end7,10,21. In the work here described, X-ray crystallography and homology modeling, together with ab initio methods, were the main contributors to the structural knowledge required for the virtual screening approach.

1.2.1.1. X-ray crystallography

(36)

Figure 1.4 - Schematic representation of the X-ray crystallography workflow. The first step comprises the

protein crystallization, which should be optimized to yield high-quality crystals. The scattering of the X-rays

caused by the atom’s electrons in the crystal are recorded and the “Phase Problem” is solved through

well-established methods. Afterwards, the first protein model is built and successive stages of refinement and

(37)

1.2.1.1.1. Protein crystallization

Protein crystals are three-dimensional ordered entities of repeated units, containing the protein atoms. They are formed through controlled precipitation of protein molecules from aqueous solution in specific physicochemical conditions. Crystals can be divided in their basic unit, named unit cell, which corresponds to its smallest and simplest volume element, described by its parameters a, b and c as edges and a, b and g for the angles36 (Figure 1.5). The unit cell contains the asymmetric unit which represents the smallest entity capable of forming the unit cell through crystallographic symmetry operations36.

The systematic repetition of the unit cell that forms a protein crystal is the key to X-ray diffraction amenable to provide information for structure determination.

The reason why protein crystallization is such a challenging endeavour is related to the uncountable physicochemical variables involved in this process, that ranges from sample preparation, solution pH and ionic strength to temperature or precipitating agent used37. Also, proteins are very complex macromolecules stabilized by intramolecular hydrogen bonds and solvent shells36. The goal is to promote protein

stabilization at extremely high concentrations which allows their controlled precipitation, maintaining non-covalent interactions essential to the crystal’s ordered nature.

Protein crystallization can be achieved through several methods: vapor diffusion, microbatch under oil, or microdialysis36–38. The most common method used is vapor diffusion, because it uses low amounts of protein and can be implemented through two different techniques, hanging or sitting drop (Figure 1.6). The vapor diffusion method consists in having a protein drop into which is added a precipitant solution, also present in the reservoir. The reservoir and the protein-precipitant drop are sealed, usually with a coverslip, and water in the form of vapor is transferred from the protein drop to the reservoir solution, since the precipitant concentration is lower in the drop and an equilibrium is forced in this closed system.36

Figure 1.5 - Representation of a (triclinic) unit cell

(38)

The protein crystallization process entails many stages, which can be followed and understood through a phase diagram (Figure 1.7)36–38. Starting with a protein solution at high concentration and purity, the precipitant is added and the system is closed. At this initial stage, the drop is undersaturated. Then, as the drop’s water

content starts to decrease, the solution becomes supersaturated, hopefully reaching the labile region of the phase diagram. In the case of excessive protein and/or precipitant concentrations, the system finds itself in the precipitation region, where undesired amorphous aggregation of protein molecules occurs. If the conditions are favourable, nucleation begins, meaning that the first protein molecules start to interact non-covalently, creating ordered and stable nuclei. Afterwards, the nuclei turn into crystals, with fewer protein molecules in solution, which corresponds to the metastable zone.

In the case of microbatch under oil, the protein solution is mixed with the precipitant and

submerged under mineral oil. This prevents any vapor diffusion events and only the precipitant agents are responsible for the crystallization phenomenon.38

The above-mentioned phase diagram is an effective guide for the crystallization process, however it concerns solely the relationship between protein and precipitant concentrations. One of the most time-consuming steps is finding the promising preliminary precipitant solutions that

Figure 1.6 - Schematic representation of the vapor diffusion method associated with the hanging (A) and

sitting (B) drop techniques.

A

B

Figure 1.7 - Representation of the Phase

Diagram used to guide a protein

(39)

through optimization can lead to suitable crystals for X-ray diffraction. Currently there is a plethora of crystallization screens commercially available, from different companies as Hampton Research, Jena Biosciences and others, covering a broad range of conditions and designed to provide the desired preliminary crystallization conditions. Further optimization of these conditions is usually required and can be achieved by variation of an enormous number of parameters: methodology, protein and precipitant concentrations, reservoir volume, protein-precipitant drop ratio, buffer pH, addition of salts/polymers/organic molecules/detergents, temperature and many others37,38.

1.2.1.1.2. X-ray diffraction

After obtaining an appropriate protein crystal, it is mounted on a goniometer head and exposed to a collimated, monochromatic and intense X-ray beam. The result is a diffraction pattern captured in a detector plate where the diffracted X-rays collide (Figure 1.8). The three most common sources of X-rays are: X-ray tubes, rotating anode tubes and synchrotrons.36

In house diffractometers commonly use X-ray tubes to produce X-rays, where a heated

filament generates electrons that are accelerated by an electric field toward a metal target (usually copper, molybdenum or chromium). This high-energy electron collides with the metal atoms and displaces an electron from a low-lying atomic orbital. Then, an electron from a higher orbital occupies the vacant lower orbital position, emitting its excessive energy as an X-ray photon. The characteristic orbitals from each metallic element provide a wide range of wavelengths of the resulting X-rays.36

Synchrotron sources (Figure 1.9) produce X-rays as a consequence of electron acceleration through increasing magnetic fields, that are synchronized with the electrons kinetic energy39. These electrons travel through a closed-loop path and are conserved in particle storage rings. The bending phenomenon of the accelerated electrons through the application of magnets produces tangential X-ray photons that are captured by beamlines. Synchrotron radiation has

Figure 1.8 - Schematic representation of the X-ray diffraction experiment. An intense X-ray beam is directed

at the protein crystal (a three-dimensional array of repeated and ordered units). The X-rays are diffracted by

(40)

several advantages compared to in house sources: higher brilliance, tuneable wavelength and bandwidth. Indeed, collection of a complete data set from a crystal using synchrotron radiation can be achieved in several seconds, while the same experiment can take hours using in house sources, which are also limited to lower resolution limits.36

The X-ray diffraction experiment generates a diffraction pattern comprising reflections, which are the spots recorded in the detector plate where the diffracted X-rays collided. This phenomenon occurs according to the Bragg’s law, through the constructive and destructive interference of electromagnetic waves. These reflections correspond to the reciprocal space and are indexed using the coordinates hkl (Miller indices). The reciprocal space is depicted in the diffraction pattern recorded and has an inverse relationship with the real space where the protein structure is represented. The Bragg’s law postulates that the reflections in a diffraction pattern are the result of constructive interference between the electromagnetic waves diffracted by all the atoms of the crystal (Figure

1.10). The constructive combination of the diffracted waves gives rise to different intensities for each reflection. This is the other important parameter for structure determination, the reflection’s intensity (Ihkl).36

Figure 1.9 – Schematic representation of a synchrotron

(adapted from: http://www.synchrotron.org.au/synchrotron-science/what-is-a-synchrotron).

Figure 1.10 - Schematic representation of the Bragg's law, which is based

on the constructive interference of waves. The reflection resulting of the

sum of X-ray 1 and 2 (R1 and R2) has strong intensity if the distance

(41)

1.2.1.1.3. Structure determination

What crystallographers try to accomplish is to describe the electromagnetic waves diffracted by the atom’s electron densities, which requires three physical parameters: wavelength (l), amplitude (F), and phase (a). The first one is defined during the X-ray diffraction experiment, since it is tuneable by the user and intrinsic of the apparatus. The second one is measured experimentally, being directly related to the reflection intensities recorded on the detector plate. The last one is the missing information that originates the “Phase Problem” in crystallography.36,40

Electromagnetic waves are periodic functions and even the most intricate ones can be described by the sum of sine and cosine functions. These sums are called Fourier series and are used to represent the X-rays diffracted by the unit cell’s atoms as structure-factor equations, applied to each reflection recorded (Fhkl) (Equation 1). Experimentally, these structure-factors are determined by calculating the square-root of the relative intensities of each reflection and taking into account correction factors (Fobs).36

Through a Fourier transform, it is possible to convert the structure-factors into the average electron density of a volume element, centred at x, y, z (Equation 2). This transform switches the integral into a triple sum, because the Fhkl represent discrete values, which are the reflections in the diffraction pattern. The equation expresses the desired electron densities as function of the known êFhkl ê and the unknown ahkl. To calculate the electron density map, the estimation of the phase angles for each reflection is required. This challenge can be surpassed by different methods: multiple isomorphous replacement (MIR), single or multiple anomalous dispersion (SAD/MAD) or molecular replacement (MR). MIR is used by the preparation of heavy atoms derivatives in crystals with identical unit cell parameters. These heavy atoms originate differences in the intensity of some reflections that when applied in Patterson maps enable the localization of these atoms in the unit cell, thus allowing the estimation of initial phases. SAD or MAD are based on the characteristic X-ray absorption of some atoms, such as selenium, which alters the intensity of symmetry-related reflections, the Friedel’s pairs. After obtaining the atoms’ positions in the unit cell, through Patterson methods, it is possible to estimate the phases of all reflections. However, if there are solved homologous structures of the protein of interest, MR is commonly used. This method comprises the application of a known structure as a phasing model. Since the phases of atomic structure-factors depend on the structure position and orientation in the unit cell, the superposition of the model to the unknown structure is searched. Using Patterson maps of both the known and the unknown structures, rotational and translational operations are performed to achieve the best superposition, which is judged by the agreement between the amplitudes of the

F

hkl

=

!

V

ρ

(x, y, z)e

2

π

i(hx

+

ky

+

lz)

dV

,

(42)

calculated structure-factors, êFcalc ê, and the measured amplitudes, êFobs ê. The computed phases of the model structure-factors are used as estimates for structure determination of the unknown structure.36,40

After solving the phase problem and solving the structure through the experimental structure-factors and the estimated phases, the model is submitted to several cycles of refinement to yield a reliable crystallographic model. To achieve this, the crystallographer must interpret the model and adjust the atom’s coordinates consonant to the suggested electron densities. Then, through Fourier transforms, these electron densities are reconverted into calculated structure-factors (Fcalc) and better phases can be retrieved from these data. Statistical parameters are important to evaluate a model’s quality, such as the R and Rfree (Equation 3). Both terms reflect the amount of data that is not corroborated by the experimental results. In the case of the Rfree, a percentage of the original data set, usually 5 to 10%, is used for comparison with the Fcalc, since this data was removed from the refinement calculations and hence it is not affected by misinterpretation of the model.36,40 Other parameters as root-mean-square deviation (rmsd) of bond lengths and angles or the Ramachandran plot are also used to evaluate model geometry and, more generally, reliability.

ρ

(x, y, z)

=

1

V

!

h

!

k

!

l

|F

hkl

|e

2

π

i(hx

+

ky

+

lz

α

hkl

)

.

Equation 2 - Equation of the atom's electron density as a function of the structure-factors, where a

hkl is the

unknown phase that must be estimated to solve the structure of interest.

R

=

!

||

F

obs

|

|

F

calc

||

!

|

F

obs

|

.

Equation 3 - Equation of the R-factor, which determines the amount of data that do not agree with the

(43)

19

1.2.1.2. Homology Modeling

Currently, if the therapeutic target’s structure is unknown and some homologue protein structures have been already solved, homology modeling poses as an effective approach for structure prediction in the context of SBVS. The basic principles of this technique are:

- protein structure is more conserved throughout evolution compared to protein sequence;

- there is evidence of finite number of unique protein folds in nature42. Hence, proteins with similar sequence, at least 30% homology, tend to adopt the same fold43. Programs like Phyre244, ModWeb45 and SWISS-MODEL46 can be used for structure prediction through homology modeling. In the present work, the Phyre244 server was chosen and its normal procedure encompasses the following steps (Figure 1.11): 1) sequence homology search against a non-redundant database with less than 20% identity between sequences, multiple sequence alignment, using PSI-BLAST47, secondary structure prediction through PSIPRED48 and conversion of both into a hidden Markov model (HMM) profile; 2) HMM profiles’ search against a database and main-chain model building; 3) insertions and deletions in the sequence built through loop modeling; 4) addition of side-chains to the model.

The first stage of homology modeling performed by Phyre244 is finding the templates based on sequence homology between the protein of interest and an extensive database. The only input required for the Phyre244 server is the protein sequence of interest inserted manually or in a FASTA format file. HHblits49 is the method used for templates’ retrieval from the protein

Query sequence HHblits Multiple sequence alignment PSIPRED Secondary structure prediction Query hidden Markov model HMM database of known structures HHsearch

Alignment between query and template Crude backbone Loop modeling Add side chains Final model 1 4 3 2 A A A B

Figure 1.11 - Schematic representation of the normal mode protocol followed by the Phyre244 server to

(44)

sequence database. It comprises the conversion of the query sequence to a HMM profile, by producing evolutionary sequence alternatives through mutations to physicochemical similar residues and calculating the probabilities of each amino acid for each position. Then, the HMM profile is searched against the sequence-to-profile database and successive refinements of the HMM profile are performed49. In addition, the protein’s secondary structure is predicted based on the query sequence by PSIPRED48. This process involves neural networks trained in protein sequence profiles and allows the prediction of structural elements, such as a-helixes, b-sheets and coils.44

After the HMM profile calculation, the secondary structure predicted is converted into a HMM profile, giving rise to a merged HMM profile that defines the protein of interest. Afterwards, an HMM-HMM search is performed, by HHsearch50, resulting in a list of high-scored templates of known structure. This generates crude backbone models that do not contain side-chains.44

To predict the target protein structure with insertions or deletions, or even with regions where the sequence differs substantially from the templates, loop modeling is applied51. In the Phyre244 protocol for insertions, a library of fragments with known structures is used based on sequence-profile matching, creating a list of potential useful fragments with similar sequence and endpoint distances. Similarly, for deletions a sequence window on either endpoints is used for fragment search in the mentioned library44. To model these fragments the cyclic coordinate descent (CCD) algorithm is applied, where each degree of freedom is adjusted at a time to move the end effector toward the target endpoint51.

The final step of the process involves side-chain’s placement and fitting to the backbone model derived in the previous stages. This is achieved using the residue-rotamer-reduction (R3)52 protocol which is a graph-based technique that allows elimination of residues and rotamers via global optimization procedures. In this way, the lower energy conformations are preferred and successive iterations eliminate alternatives and simplify the residue graph generated52.

In the case of the intensive mode of Phyre244 modeling (Figure 1.12), the standard protocol is followed by generating several structures concerning homology modeling to different regions of the protein sequence. Then, the Ca-Ca distance-constraints from each structure are extracted. These portions are maintained while the sequences without templates are modeled through ab initio methods, using the Poing algorithm53. Afterwards, the protein model is built from the main-chain representation, through the Pulchra software54. Finally, the side-chains are added and fitted by the R3 protocol52.

(45)

calculations and secondary structure prediction using PSIPRED48,53. The solvent bombardment model accounts for solvation effects of each residue, ensuring that hydrophobic amino acids are buried in the protein folding53.

1.2.2. Virtual Screening

Virtual screening relies on docking methods to predict compound binding to a pharmaceutical target of interest. This approach attempts to yield hit molecules by analysing their posing and scoring results regarding its target55,56. It can be an insightful and efficient technique for probing large compound libraries, before progressing to the experimental stages. There are two main variants of virtual screening consonant with previously obtained knowledge: ligand-based and structure-ligand-based55. Docking and scoring is a fundamental part of either approach in the hit discovery process.

1.2.2.1. Ligand-based Virtual Screening

Ligand-based virtual screening is based on the similarity principle, which implies that similar compounds have identical biological effects. This means that libraries can be probed for novel promising compounds with similar structure in comparison to one or a few previously determined hits. The measure of similarity becomes the main difference between several

ligand-Query sequence

Normal Phyre protocol

Multiple high-scoring models covering different regions of query

Extract C-Cdistance constraints from models

Ab initio

Constrained Constrained

Poing: Synthesize from virtual ribosome. Springs for constraints. Abinitio modeling of missing regions. Backbone and side chain reconstruction.

Final model

Backbone and side chain addition

Poing

A A

A B

A

Figure 1.12 - Schematic representation of the intensive mode protocol used by the Phyre244 server, which

differs from the normal mode by allowing the modeling of multiple regions from different templates and by

(46)

based virtual screening methodologies. This evaluation can be established by two-dimensional descriptors, in particular fingerprints and shape comparisons, or three-dimensional descriptors, such as pharmacophore-based.55

Two-dimensional descriptors are commonly used in ligand-based virtual screening due to their simplicity and efficiency. These allow searching for substructures in large compound libraries, finding ligands that contain a privileged motif for a specific target. This methodology can also be important to exclude compounds with undesired characteristics present in the probed library. Molecular fingerprints are widely used to determine structure similarity between two compounds. They are stored as bit strings, wherein the absence (“0”) or presence (“1”) of a list of substructures is recorded for each molecule (Figure 1.13). Thus, the similarity comparison is computed regarding the individual bits of their bit strings and calculating similarity indices, such as the Tanimoto coefficient (Equation 4).55

The strength of a protein-ligand interaction is intimately related to the shape of the binding pocket and the capability of the compound to fill that space. This steric complementary can be an important factor for ligand-based search, since two molecules with identical shape might possess similar biological activity55. Using this premise, there is software able to retrieve similarly shaped compounds from libraries, as rapid overlay of chemical structures (ROCS)57.

Figure 1.13 - Example of bit strings generated for two molecules using a small set of substructures.55

=

+

AB AB

A B AB

n

T

n

n

n

Referências

Documentos relacionados

The probability of attending school four our group of interest in this region increased by 6.5 percentage points after the expansion of the Bolsa Família program in 2007 and

A "questão social" contemporânea, chamada erroneamente de "nova questão social", advém dessas profundas modificações nos padrões de produção, de

i) A condutividade da matriz vítrea diminui com o aumento do tempo de tratamento térmico (Fig.. 241 pequena quantidade de cristais existentes na amostra já provoca um efeito

didático e resolva as ​listas de exercícios (disponíveis no ​Classroom​) referentes às obras de Carlos Drummond de Andrade, João Guimarães Rosa, Machado de Assis,

não existe emissão esp Dntânea. De acordo com essa teoria, átomos excita- dos no vácuo não irradiam. Isso nos leva à idéia de que emissão espontânea está ligada à

Ousasse apontar algumas hipóteses para a solução desse problema público a partir do exposto dos autores usados como base para fundamentação teórica, da análise dos dados

Ref- ugees certified as having Class A-TB accounted for about 2 per cent of all entering refugees and 57 per cent of the tuberculosis cases among refugees;

Structure-based drug design (SBDD) is a very useful approach when the biological target is known, and its 3D structure is available.. Based on the target 3D structural