Study of Metals in Proteins-Deriving parameters for Molecular Dynamic Simulations from Quantum Mechanical calculations of active centers in metalloproteins

(1)

(2)

II

(3)

III

Acknowledgements

I would like to thank my Supervisors Professor Maria Joao Ramos and Dr.

Sergio Sousa, and also my colleague Rui Neves, who were always with me

in solving any problems and gave me all necessary freedom of choice in

doing research. Also to all the staff of the University of Porto and

Faculdade de Ciências, and to all my colleagues at the Theoretical

Chemistry and Computational Biochemistry Group.

(4)

IV

Abstract

Having precise parameters for performing (MD) simulations is crucial for getting proper outcome of the simulations. A lot of efforts have been done to parameterize interactions between different types of atoms. Several Biomolecular force fields (AMBER, GROMOS, CHARMM, etc) have made possible to simulate various systems and obtained significant results that for the majority of cases could not be obtained experimentally.

Active centers of the enzymes are their most rigid parts with highest amount of nonlinear excitation of normal vibronic modes. This gives opportunity to release accumulated energy in any instance that provides necessary reactivity of the enzyme. To know the precise geometry of the active centre is most important for studying its functionality. Molecular dynamic simulations are extremely used for obtaining a dynamic picture of the conformational changes of the proteins, their interaction, interaction with ligands, etc. But (MD) parameters for the metal centred active sites (about half of the enzymes are metalloproteins) do not exist. The approximations required often lead to a non-realistic description of the active centers in metalloproteins. The current work is devoted to the parameterization of active centers of Nickel metalloproteins. F430 cofactor, the only Ni tetrapyrrole derivative that exist in nature and serves as active site of Enzyme Methyl-Coenzyme-M Reductase, and the active centre of NiFe hydrogenase were targeted in this study.

(5)

V

Summary

This thesis starts with a brief introduction concerning metalloproteins. Aspects as their distribution in environment, importance and rate of Metals involved in Biological systems will be covered, with main focus done on Nickel.

The Second part is totally devoted to the description of the studied systems. Chemical and physical properties of the active centers need to be precisely determined for proper implementation of the calculations.

The following part presents an overview of the main topics at the heart of Molecular Dynamics, as to give a proper perception to the reader of what and why we are doing. The main focus is done on force field energetic terms, their formulation and role of parameters in these functions.

Part of the Quantum Mechanics that will be used during the calculations is discussed in the subsequent section. Aspects like benefits of DFT and B3LYP functional, definition of basis set and RESP methodology will be explored.

The fifth section is devoted to the description of our algorithm, benefits of our algorithm compared to others, definition of the QM methods and basis sets used for metal and non-metal atoms and a few more technical aspects of the calculations.

The Sixth section of the thesis provides the results of our calculations. Harmonic force constants of relevant energetic terms, problems occurring during the calculations and Mulliken charge redistribution will be given in tables and figures. The corresponding plots of the distribution will be presented and discussed.

And in the final part will be conclusion that will briefly summarize results of conducted research.

(6)

VI

Index

1. Introduction ...1

1.1 Metals in Proteins...1

1.2 Nickel in Biological Systems...2-4 1.3 Coordination chemistry of protein active centers and Ni complexes...5-6 2. Description of the Systems under the Study...7

2.1 Coenzyme F430...7-12 2.2 NiFe Hydrogenase...12-13 3. Theory of Molecular Dynamics...14

3.1. Introduction...14-15 3.2. Historical Background...15

3.3. Classical Mechanics...15-18 3.4 Integration Algorithms...18-19 3.4.1 The Leap-frog algorithm...20

3.4.2 The Velocity Verlet algorithm...20

3.4.3 Beeman’s algorithm...21

3.4. Force Field and Potential Energy Function...21

3.4.1 Introduction...21-22 3.4.2. Non-bonded Interactions...23

3.4.2.1 Van der Waals energy...23-26 3.4.2.2 The electrostatic energy: charges and dipoles...26-28 3.4.3 Bonded Interactions...28

3.4.3.1. The Stretch Energy...28-30 3.4.3.2 The Bending Energy...30

3.4.3.3 The Torsional Energy...31

3.5 Techniques used for (MD) simulations of metal centered systems...32

4. Quantum Mechanics...33 4.1. Basis set Chemistry...33-34 4.2 QM Methods and their preference for studying Metal Centered systems...34-35

(7)

VII

4.3 Restrained Electrostatic Potential (RESP)...35-36 5. Different approaches used during parameterization and our strategy...37-39 6. Results and Discussion...40 6.1 Bond and Angle scans for Ni-C and Ni-B...40-50 6.2 Bond and Angle parameters of Cofactor F430...51-54 6.3 Results of the Mulliken Charges...55-56 7. Conclusion...57 Bibliography ...58 References ...58-59

(8)

1

1. Introduction

1.1 Metals in Proteins

A commonly cited approximation is that one-third of proteins require metals. A systematic bioinformatics survey of 1.371 different enzymes for which three dimensional structures are known estimated that 47 % required metals, with 41% containing metals at their catalytic centers(1). It is noteworthy that complicated metal centers can remain poorly defined even after structure determination. Metalloenzymes occur in all six Enzyme Commission classes, accounting for 44 % of exidoreductases, 40 % of transferases, 39 % of hydrolases, 36 % of lyases, 36 % of isomerases and 59 % of ligases. Magnesium is the most prevalent metal in metaloenzymes. A catalogue of the principal types of enzyme that uses each metal reveals that iron (81%), cooper (93%) and molybdenum plus tungsten (81%) are most commonly used as conductors for electrons in oxidoreductases (1). The proportion of all proteins, not just enzymes, using metals is expected to be less than 47 % and the relative contribution of the different metals may alter as the metals that perform structural roles, such as zinc in zinc-fingers, are more fully accounted for. On a figure 1.1 we can observe number of the proteins using the indicated metals that can be found in PDB data base.

(9)

2

1.2 Nickel in Biological Systems

Nickel has long been suspected to be an essential trace element for living organisms, but the identification of its functions in molecular terms is relatively recent. The first nickel protein to be identified was urease (urea ammonia hydrolase). This was demonstrated 49 years after the original isolation and crystallization of the enzyme by Summer (2). This enzyme is of widespread occurrence, and the specific requirement for nickel explains many of the effects of nickel deficiency in plants.

Certain bacteria – in particular hydrogen bacteria, methanogens, and acetogens – were found to have a relatively high demand for nickel as a trace element. These chemotropic bacteria are named from the metabolic process by which they obtain energy for growth. All of them use hydrogen as reductant, in different ways. The hydrogen bacteria, such as Alcaligenes and Nocardia, are aerobes and catalyze the oxidation of hydrogen. Methanogens, of which the best-studied genera are Methanobacterium and Methanosarcina, are anaerobes which convert carbon dioxide to methane, using reducing agents such as hydrogen gas. The acetogens, such as Clostridium thermoaceticum, Clostridium aceticum, and Acetobacterium woodii, are also strict anaerobes and catalyze the conversion of carbon dioxide and carbon monoxide to acetic acid using hydrogen and other reductants.

Generally eight Ni enzymes are known: urease, hydrogenase, CO-dehydrogenase (CODH), acetyl-coenzyme A synthetase, methyl-coenzyme M reductase, Ni-superoxide dismutase, glyoxalase I enzymes, and cis-trans isomerase.

The requirement for nickel was explained when it was found that the enzyme hydrogenase contains nickel. Hydrogenases are enzymes that catalyze the production or consumption of hydrogen gas. Not all hydrogenases contain nickel, and in some of those that contain nickel, the nickel is EPR silent. Properties of a few of the known Ni hydrogenases are summarized in Table 1.2

(10)

3

Table 1.2. Composition and spectroscopic properties of typical Ni Hydrogenases

Organism Type of hydrogenase Center Paramagnetic state Methanobacterium

thermoautrophicum (methanogenic bacterium)

Soluble, deazaflavin reducing Ni 2-3 [4Fe -4S]

Oxidized, Ni(III) H2 reduced, Ni(I) EPR silent Desulfovibrio gigas

(sulfate – reducing bacrerium)

Soluble, periplasmic Ni [3Fe-xS] [4Fe-4S]

Oxidized, Ni(III), Ni-A H2 reduced, Ni(I), Ni-C Oxidized Reduced Chromatium vinosum (anaerobic photosynthetic bacterium) Membrane bound Ni [3Fe-xS]- or [4Fe-4S]

Oxidized, Ni(III), Ni-A H2 reduced, Ni(I) Oxidized

Nocardia opaca

(hydrogen-oxidizing bacterium)

Soluble, NAD reducing Ni [3Fe-xS] 3[4Fe-4S] [2Fe-2S] RMN EPR silent Oxidized Reduced Reduced Semiquinone radical x – Denotes number of sulfur atoms in the iron sulfur clusters

(11)

4

In the methanogenic bacteria, nickel is also involved in a second process, in which a complex series of reactions leads to the release of methane gas. The enzyme involved, methyl-coenzyme M reductase, contains a cofactor F430 which is a nickel-porphinoid complex. The methanogens belong to an unusual group of bacteria described as Archaebacteria, which appear to be very ancient. Like the acetogens they are strict anaerobes. Although they are now restricted in habitat, these organisms, with their metabolism involving nickel and the related cobalamins, may have played an important part in the early phases of evolution.

A third nickel-containing enzyme found in some strictly anaerobic bacteria is involved in the oxidation of carbon monoxide to carbon dioxide, and in the formation of acetyl-coenzyme A from carbon monoxide. This reaction was recorded to occur sewage sludge in 1932 by Fischer(3), who was one of the cooriginators of the Fisher-Tropsh synthesis of hydrocarbons from carbon monoxide. It has been given the trivial name of carbon monoxide dehydrogenase but this seems illogical in view of the fact that carbon monoxide contains no hydrogen. Since the principal function of the enzyme is probably to synthesize acetyl-coenzyme A, the name “acetyl-CoA synthase” would be more appropriate. It is a major enzyme of acetogenic bacteria, which have a novel pathway for fixation of CO2 and it is also found in methanogens.

As we can see compairing to their transition metals Nickel is not typically considered to be a biologically important element, however it is used by fungi, algae, archaea, eubacteria and higher plants. These organisms exploit nickel reactivity in the active sites of various enzymes that are important for catalyzing reactions in systems such as carbon dioxide and carbon monoxide metabolism, hydrogen uptake and production, and urea hydrolysis. Many microorganisms employ nickel-mediated processes to facilitate colonization in environments that are often regarded as inhospitable. For instance, the urease enzyme facilitates Helicobacter pylori colonization in the human acidic gastric mucosa by producing ammonia that helps to neutralize the microenvironment of the microbe.

(12)

5

1.3 Coordination chemistry of protein active centers and Ni complexes

At catalytic centres, metals increase acidity, electrophilicity and/or nucleophilicity of reacting species, promote heterolysis, or receive and donate electrons. The protein’s primary and secondary metal coordination spheres tune the properties of the metal to optimize reactivity and influence metal selection. Donor ligands (S, O or N) can impart bias in favour of the correct metal. The metal-binding pocket can exclude ions with the wrong charge. Coordination geometry (octahedral, tetrahedral, square pyramidal, trigonal bipyramidal, square planar, trigonal or linear) can impart bias either in folded apoproteins if the preformed site is rigid or during folding if favourable energetics is coupled to the correct geometry. The figure 1.3 shows the dominant geometries of four divalent metals in proteins:

Figure 1.3. Different coordination geometries of the active centers of the protein

However, because proteins have flexibility, steric selection between metals is imperfect, especially in nascent polypeptides. Under these conditions, the relative affinities of metals for proteins are significantly governed by the ligand field stabilization energies of the metals themselves. This creates the universal orders of preference, which for divalent metals is the Irving–Williams series. There is ambiguity about the position of zinc, which is either at the top of the series or somewhere above cobalt. This ambiguity is attributed to the nephelauxetic effect. Cuprous ions, expected to dominate in more reducing cell environments, are also competitive, and some exceptionally tight ferric complexes are known. Crucially, such affinity series underpin calculations that each metal’s relative abundance in the biological locality is paramount in governing selective metal binding by proteins, highlighting the vital contribution of cell biology to the selection of metals by metalloproteins.

The coordination chemistry of nickel spans a variety of geometries, coordination numbers, and oxidation states. Nickel complexes are known with oxidation states

(13)

6

ranging from -1 to +4. However, the most common oxidation state is Ni(II) . The majority of early coordination chemistry work focused on Ni(II) complexes, although recent interest toward understanding the nickel centers in redox active enzymes has shifted attention to less common oxidation states (-1, 0, +1, +3 and +4). Ni forms a large number of complexes with a coordination number of 4, 5 or 6, and their geometries include all major structural types. With regard to Lewis acidity, Ni is considered to be a borderline metal ion. This is because it binds to both soft and hard ligands and sometimes, albeit rarely, to both in the same complex. Table 1-1 summarizes the information on the oxidation states and geometries that are common for Ni(II) complexes as well as examples from the literature.

Table 1.3. Oxidation States and Stereochemistry of Nickel(II) complexes

Oxidation State Coordination number Geometry Examples Ni(II), d8

4 Square planar

NiBr2(PEt3)2, [Ni(CN)4]

Tetrahedral

[NiCl4]2-, NiCl2(PPh3)2

5 Square pyramidal (sp)

[Ni(CN)5]3-, [Ni2Cl8]

4-Trigonal bipyramidal

(tbp)

[Ni(CN)5]3-, Ni(SiCl3)2(CO)3

6 Octahedral

Ni(NH3)62+, [Ni(bipy)3]2+

(14)

7

2. Description of the Systems under the Study

The work here described targets two very important Nickel containing metalloproteins: NiFe hydrogenase and Methyl-Coenzyme-M Reductase. NiFe Hydrogenase contains NiFe active centers and can be found in two oxidized and ready catalytic forms. MCR (Methyl-Coenzyme-M Reductase) contains a F430 coenzyme that is a Ni tetrahydrocorphinoid derivative. This chapter is summarizing literature research that gathered necessary information required for understanding the studied systems and for a proper implementation of the QM calculations.

2.1 Coenzyme F430

Anaerobic bacteria produce 400 million tons of methane annually (4). About 45 million metric tons of the methane escape into the troposphere and significantly contribute to the greenhouse effect (4,5). MCR is a key enzyme common to all methane-producing pathogens. It catalyzes the last step of methanogenesis in which methyl-coenzyme M (Figure 2.1.1) (Me-CoM) and coenzyme B (Figure 2.1.2) are joined by the formation of a disulfide bond (CoM-S-SCoB) and methane is released(5).

The Process of methanogenesis depends on different initial substrates that can be Glucose, CO2, methanol and etc. and implies different chains of enzymatic reactions for decomposing the substrate with participation of different enzymes. However, the step that takes place in the active center of the MCR is crucial and common for every substrate. In this part, methyl-coenzyme M is a central intermediate. It is formed from coenzyme M and acetate, CO2 or reduced C1 compounds such as methanol, methylthiols and methylamines via enzymatic reaction pathway. Methyl-coenzyme M is subsequently reduced with coenzyme B to methane with the concomitant formation of the heterodisulphide of coenzyme M and coenzyme B.

CH3–S–CoM + HS–CoB → CH4 + CoB–S–S–CoM

Recently, indirect evidence that MCR also catalyzes the anaerobic oxidation of methane has been reported (6). Several crystallographic structures of MCR extracted from different microorganisms are published (7-9). Each MCR is composed of three subunits

(15)

8

in a ( )2 structure and contains two noncovalently bound molecules of cofactor F430 that are located about 50 Å from each other. Coenzyme F430 is a yellow nickel porphinoid found in all methanogenic bacteria. It is the chromophore of the enzyme methyl-CoM reductase, to which the coenzyme is tightly bound. This unusual nickel tetrahydrocorphinoid cofactor, (Figure 2.1.3), is most likely the active site of MCR. It has been suggested that the binding of Me-CoM and coenzyme B to one active site induces a conformational change that is required to expel the product heterodisulfide from the second site into the water phase.

H C

3

Figure 2.1.1 Part of Coenzyme M left side, and methyl Coenzyme M right side - coenzyme required for methyl-transfer reactions in the metabolism of methanogens

Figure 2.1.2 Coenzyme B - coenzyme required for redox reactions in methanogens

(16)

9

Nature uses multiple tetrapyrroles - hemes, chlorophyll, and cobalamin (Figure 2.1.4). F430 is the most reduced tetrapyrrole in nature with only five double bonds. This particular tetrapyrrole derivative is called a corphin. Because of its relative lack of conjugated unsaturation, it is yellow, not the intense purple-red associated with more unsaturated tetrapyrroles. It is also the only tetrapyrrole derivative found in nature to contain nickel. Ni(II) is too small for the N4 binding site of the corphin, which causes the macrocycle to adopt a ruffled structure. F430 could be considered as Ni derivative of vitamin B-12.

Figure 2.1.4From left to right Heme B cofactor, pigment Chlorophyll, vitamin B-12, also called cobalamin

The cofactor F430 is held in place within the protein via hydrogen bonding to the partially negatively charged carboxylate groups of the Ni-porphrinoid. The nickel ion is redox active and is found in nickel(I) EPR visible and nickel-(II) EPR silent forms. At one time, it was thought that the aromatic porphyrin macrocycle would be planar. In fact, early structure determinations constrained the macrocycle to be planar. However, when high-quality crystallographic determinations of porphyrins and metalloporphyrins began to appear, it was soon obvious that the porphyrin ring was subject to a number of distortions and was often distinctly nonplanar. This nonplanarity of porphyrins is biologically relevant and influences the chemical properties of porphyrin complexes. Most deformations of the porphyrin core are in a direction perpendicular to the tetraaza plane. They can be classified into six classes. These are based on simple symmetric deformations, one of each out-of-plane symmetry classification of the (D4h) point group of the square-planar macrocycle. More complicated asymmetric distortions that are composed of combinations of these simple distortions are also found. A normal-coordinate structural decomposition (NSD) procedure has been derived to characterize

(17)

10

and quantify porphyrin deformations in proteins. The method determines the out-of-plane distortions in terms of the equivalent distortions along the lowest-frequency normal coordinates of the porphyrin. Nonplanar deformations are important because numerous studies have shown that nonplanar distortions have a significant effect on the chemistries of tetrapyrrole complexes. It has been suggested that the nonplanar deformations observed in photosynthetic proteins are responsible for the photophysical and redox properties of chlorophyll pigments. Nonplanar porphyrins are easier to oxidize than planar porphyrins. Excited-state lifetimes of porphyrins are influenced by nonplanar deformations, as is the axial ligand affinity.

Various experiments were conducted in order to detect magnetic properties of cofactor F430 in free state and as well as in state bound to the protein (10). Axial ligation of F430 to the coenzyme M and coenzyme B was experimentally proved in various articles (10).

Various literature resources are providing EPR state of unbound F430 as silent, diamagnetic state S=0, with oxidation state of Ni(I) with charged cationic state of Nickel (11) (Figure 2.1.5). In case of axial ligation (in water solution) or in protein with coenzyme M and coenzyme B or other ligands (Figure 2.1.6), nickel is changing its oxidation state to Ni(II) with high spin paramagnetic nickel S = 1 (10).

Figure 2.1.5. Structural representation of F430 taken from different articles, right side represents coenzyme F430, 15,l 73-seco-F430-173-acid 3, and the corresponding penta- and hexamethyl esters 2 and 4, respectively

(18)

11 Ni N N N N S o

Figure 2.1.6. Axially ligated F430 in the protein (extracted from PDB code 1HBN)

Various X-ray structures of this enzyme can be found in the Protein Data Bank. MCR has three subunits, containing one Zn coordination center and two Ni containing F430 cofactor, on its surface for majority of refined structures could be found Na and Mg ions as well as Cl (figure 2.1.7).

Figure 2.1.7. Different PDB structures of the MCR from left to right with pdb code 1MRO, 1HBN, 1HBN. Spheres representing - Green – Zn, Silver – Ni, Na – yellow, red Mg, blue – Cl.

For theoretical studies, the X-ray structure of METHYL-COENZYME M REDUCTASE (pdb code 1HBN) obtained with 1.16 Å resolution was chosen. Model of F430 was simplified as to eliminate unnecessary charge from the system and decrease total amount of atoms. Carboxylate groups of the Ni-porphrinoid were cut of alongside with relevant alkane chains connecting them with the main cyclic body of F430. They

(19)

12

were substituted by hydrogen atoms. The following structure was used as a starting model (Figure 2.1.8):

Figure 2.1.8. Different representations of the prepared system of F430

2.2 NiFe Hydrogenase

The type and functionalities of the different types of Hydrogenases were briefly summarized in the introduction part. Here we will concentrate on particularities of the active center and in the representation of chosen model.

The Active center of NiFe Hydrogenase consists of four Cysteine residues (two of them bridging ligands connecting Fe and Ni) surprising ligands one CO and two CN around Fe atom that usually could not be spotted in the active centers of the enzymes and a third bridging ligand X that varies depending on catalytic state of the enzyme. We chose for parameterization a catalytic active Ni-C and an inactive oxidized form referred as Ni-B of the active center (Figures 2.2.1 and 2.2.2). The difference between Ni-C and Ni-B is only at the bridging ligand. In the first case, when we have a catalytically active center, the bridging ligand is Hydride. For the reduced active state of Ni-C, in the second case for oxidized Ni-B it is Hydroxide (12).

Various studies in the literature provide magnetic properties and charge state of Ni-C and Ni-B (12). In both Ni-C and Ni-B the overall charge of the complex is -2, and the spin multiplicity is 2.

For our studies we have chosen NiFe Hydrogenase with pdb code 1YRQ (oxidized catalytic state) that was obtained with X-ray resolution of 1.83 Å. Ready state Ni-C was

(20)

13

obtained by substituting hydroxide bridging ligand that was in obtained structure by hydride, once X-ray structure for Ni-C don’t exist.

Ni Fe S S H S S S S H Fe Ni S S C C C C C

Figure 2.2.1 Representation of the active center Ni-C, left part complete model with every bonded residues and metal atoms, right part simplified representation counting only directly boded atoms. Metals: Silver refers to Nickel purple to Iron.

Figure 2.2.1 Representation of the active center Ni-B, left part complete model with every bonded residues and metal atoms, right part simplified representation counting only directly bonded atoms. Metals: Silver refers to Nickel purple to Iron

(21)

14

3. Theory of Molecular Dynamics

3.1. Introduction

One of the principal tools in the theoretical study of biological molecules is the method of molecular dynamics simulations (MD). This computational method calculates the time dependent behavior of a molecular system. (MD) simulations have provided detailed information on the fluctuations and conformational changes of proteins and nucleic acids. These methods are now routinely used to investigate the structure, dynamics and thermodynamics of biological molecules and their complexes. They are also used in the determination of structures from x-ray crystallography and from NMR experiments.

Biological molecules exhibit a wide range of time scales over which specific processes occur; for example:

 Local Motions (0.01 to 5 Å, 10-15 to 10-1 s)

o Atomic fluctuations

o Sidechain Motions

o Loop Motions

 Rigid Body Motions (1 to 10Å, 10-9 to 1s)

o Helix Motions

o Domain Motions (hinge bending)

o Subunit motions

 Large-Scale Motions (> 5Å, 10-7 to 104 s)

o Helix coil transitions

o Dissociation/Association

o Folding and Unfolding

Molecular dynamics simulations permit the study of complex, dynamic processes that occur in biological systems. These include, for example:

 Protein stability

 Conformational changes

 Protein folding

 Molecular recognition: proteins, DNA, membranes, complexes

 Ion transport in biological systems

In the following section will be described basics of the Molecular Dynamic Methodology. Starting from Classical mechanics and appropriate numerical integration

(22)

15

techniques that are on the heart of (MD), following with a definition of force field and independent energetic entities and potential energy function.

3.2. Historical Background

The molecular dynamics method was first introduced by Alder and Wainwright in the late 1950's (Alder and Wainwright, 1957,1959) (13) to study the interactions of hard spheres. Many important insights concerning the behaviour of simple liquids emerged from their studies. The next major advance was in 1964, when Rahman carried out the first simulation using a realistic potential for liquid argon (Rahman, 1964). The first molecular dynamics simulation of a realistic system was done by Rahman and Stillinger in their simulation of liquid water in 1974 (Stillinger and Rahman, 1974). The first protein simulations appeared in 1977 with the simulation of the bovine pancreatic trypsin inhibitor (BPTI) (McCammon, et al, 1977). Today in the literature, one routinely finds molecular dynamics simulations of solvated proteins, protein-DNA complexes as well as lipid systems addressing a variety of issues including the thermodynamics of ligand binding and the folding of small proteins. The number of simulation techniques has greatly expanded; there exist now many specialized techniques for particular problems, including mixed quantum mechanical - classical simulations that are being employed to study enzymatic reactions in the context of the full protein. Molecular dynamics simulation techniques are widely used in experimental procedures such as X-ray crystallography and NMR structure determination.

3.3. Classical Mechanics

The molecular dynamics simulation method is based on Newton’s second law or the equation of motion, F=ma, where F is the force exerted on the particle, m is its mass and a is its acceleration. From knowledge of the force on each atom, it is possible to determine the acceleration of each atom in the system. Integration of the equations of motion then yields a trajectory that describes the positions, velocities and accelerations of the particles as they vary with time. From this trajectory, the average values of properties can be determined. The method is deterministic; once the positions and

(23)

16

velocities of each atom are known, the state of the system can be predicted at any time in the future or the past. Molecular dynamics simulations can be time consuming and computationally expensive. However, computers are getting faster and cheaper. Simulations of solvated proteins are calculated up to the nanosecond time scale; however, simulations into the millisecond regime have been reported.

Newton’s equation of motion is given by

where Fi is the force exerted on particle i, mi is the mass of particle i and ai is the

acceleration of particle i. The force can also be expressed as the gradient of the potential energy,

Combining these two equations yields

where V is the potential energy of the system. Newton’s equation of motion can then relate the derivative of the potential energy to the changes in position as a function of time.

Let’s consider case of simple application of above mentioned method. Newton’s equation can be writen as:

(24)

17

we obtain an expression for the velocity after integration

and since

we can once again integrate to obtain

Combining this equation with the expression for the velocity, we obtain the following relation which gives the value of x at time t as a function of the acceleration, a, the initial position, x0 , and the initial velocity, v0..

The acceleration is given as the derivative of the potential energy with respect to the position, r,

Therefore, to calculate a trajectory, one only needs the initial positions of the atoms, an initial distribution of velocities and the acceleration, which is determined by the gradient of the potential energy function. The equations of motion are deterministic, e.g., the positions and the velocities at time zero determine the positions and velocities at all other times, t. The initial positions can be obtained from experimental structures, such as the x-ray crystal structure of the protein or the solution structure determined by NMR spectroscopy.

(25)

18

The initial distribution of velocities are usually determined from a random distribution with the magnitudes conforming to the required temperature and corrected so there is no overall momentum, i.e.,

The velocities, vi, are often chosen randomly from a Maxwell-Boltzmann or Gaussian distribution at a given temperature, which gives the probability that an atom i has a velocity vx in the x direction at a temperature T.

The temperature can be calculated from the velocities using the relation

where N is the number of atoms in the system.

3.4 Integration Algorithms

The potential energy is a function of the atomic positions (3N) of all the atoms in the system. Due to the complicated nature of this function, there is no analytical solution to the equations of motion; they must be solved numerically.

Numerous numerical algorithms have been developed for integrating the equations of motion. We list several here.

All the integration algorithms assume the positions, velocities and accelerations can be approximated by a Taylor series expansion:

(26)

19

Where r is the position, v is the velocity (the first derivative with respect to time), a is the acceleration (the second derivative with respect to time), etc.

To derive the Verlet algorithm one can write

Summing these two equations, one obtains

The Verlet algorithm uses positions and accelerations at time t and the positions from time t-dt to calculate new positions at time t+dt. The Verlet algorithm uses no explicit velocities. The advantages of the Verlet algorithm are, i) it is straightforward, and ii) the storage requirements are modest. The disadvantage is that the algorithm is of moderate precision.

(27)

20

3.4.1 The Leap-frog algorithm

Or in case of using force:

In this algorithm, the velocities are first calculated at time t+1/2dt; these are used to calculate the positions, r, at time t+dt. In this way, the velocities leap over the positions, then the positions leap over the velocities. The advantage of this algorithm is that the velocities are explicitly calculated, however, the disadvantage is that they are not calculated at the same time as the positions. The velocities at time t can be approximated by the relationship:

3.4.2 The Velocity Verlet algorithm

This algorithm yields positions, velocities and accelerations at time t. There is no compromise on precision.

(28)

21

3.4.3 Beeman’s algorithm

This algorithm is closely related to the Verlet algorithm

The advantage of this algorithm is that it provides a more accurate expression for the velocities and better energy conservation. The disadvantage is that the more complex expressions make the calculation more expensive.

3.4. Force Field and Potential Energy Function

3.4.1 Introduction

In order to implement the numerical algorithms previously described on for studying dynamic processes of molecular systems, besides the initial velocities and coordinates of every atom we need to know the forces acting on every atom in the molecule. These are usually derived from a potential energy U(rN), where rN = (r1; r2; : : : rN) represents the complete set of 3N atomic coordinates. Then forces acting on every atom are:

Potential energy function (force field energy) is specified as a sum of presumably independent energetic terms, each describing the energy required associated to a different contributions (Figure 3.4.1):

(29)

22

Figure 3.4.1. Energetic contributions that are forming potential energy function of the system

U

vdw describes the repulsion or attraction between atoms,

U

el is due to internal

redistribution of the electrons creating positive and negative parts of the molecule.

U

str

is the energy function for stretching a bond between two atoms

, U

bend represents the

energy required for bending an angle,

U

tors is the torsional energy for rotation around a

bond.

From above mentioned energetic terms

U

str

, U

bend and

U

tors

,

are bonded

intramolecular interactions and

U

vdw and

U

el are non-bonded intermolecular once.

Given such an energy function of the nuclear coordinates, geometries and relative energies can be calculated by optimization. Stable molecules correspond to minima on the potential energy surface, and they can be located by minimizing

U

FF as a function

of the nuclear coordinates. Conformational transitions can be described by locating transition structure on the

U

FF surface. There are various methods (conjugate gradient,

steepest descent, Newton-Raphson and etc.) for optimizing such multi-dimensional function that will not be discussed here.

(30)

23

3.4.2. Non-bonded Interactions

Intermolecular interactions are modeled by a potential. This potential is a function of the positions of the nuclei. The potential energy due to non-bonded interactions between N particles can be divided into terms that depend on individual atoms, pairs, triplets and so on:

where rN = (r1; r2; : : : ; rN) stands for the complete set of 3N particle coordinates. Here, 1(ri) represents the effect of an external field (including the container walls), 2(ri; rj)

represents pairwise interactions and 3(ri; rj; rk) three-body interactions. Most work

considers only pairwise interactions, since their contribution is the most significant. Pair potential depends only on the magnitude of pair separation rij = . The potential energy is then written in terms of the pair potential as:

The system under investigation determines the potential function implemented. In molecular dynamic simulations we have two types of potentials Electrostatic and van der Waals.

3.4.2.1 Van der Waals energy

The van der Waals energy describes the repulsion or attraction between atoms that are not directly bonded. Uvdw may be interpreted as the non-polar part of the interaction not

related to electrostatic energy due to (atomic) charges. This may for example be the interaction between two methane molecules, or two methyl groups at different ends of the same molecule.

Uvdw is zero for large interatomic distances and becomes very repulsive for short

distances. In quantum mechanical terms, the latter is due to the overlap of the electron clouds of the two atoms, as the negatively charged electrons repel each other. At

(31)

24

intermediate distances, however, there is a slight attraction between two such electron clouds from induced dipole-dipole interactions, physically due to electron correlation. Even if the molecule (or part of a molecule) has no permanent dipole moment, the motion of the electrons will create a slightly uneven distribution at a given time. This dipole moment will induce a charge polarization in the neighbor molecule (or other part of the same molecule), creating an attraction and it can be derived theoretically that this attraction varies as the inverse sixth power of the distance between the two fragments. The induced dipole-dipole interaction is only the leading term of such induced multipole interactions: there are also contributions from induced dipole-quadrupole, quadrupole-quadrupole, etc., interactions. These vary as R-8, R-10, etc., and the R-6 dependence is only the asymptotic behavior at long distances. The force associated with this potential is often referred to as a “dispersion” or “London” force. The van der Waals term is the only interaction between rare gas atoms (and thus the reason why say argon can become a liquid and a solid) and it is the main interaction between non-polar molecules such as alkanes.

Uvdw is very positive at small distances, has a minimum that is slightly negative at a

distance corresponding to the two atoms just “touching” each other, and approaches zero as the distance becomes large. A general functional form that fits these conditions is given as

vdw

(R

AB

) = E

repulsion

(R

AB

) -

It is not possible to derive theoretically the functional form of the repulsive interaction, it is only required that it goes toward zero as R goes to infinity, and it should approach zero faster than the R-6 term, as the energy should go towards zero from below.

A popular function that obeys, these general requirements is the Lennard-Jones (LJ) potential, where the repulsive part is given by an R-12 dependence (C1 and C2 are suitable constants).

U

LJ

(R

AB

) =

(32)

25

The Lennard-Jones potential can also be written as:

U

LJ

(R

AB

) =

Here is the minimum energy distance and

the depth of the minimum. There are no theoretical arguments for choosing the exponent in the repulsive part to be 12, this is purely a computational convenience, and there is evidence that and exponent of 9 or 10 gives better results.

From electron structure theory it is known that the repulsion is due to overlap of the electronic wave functions, and furthermore that the electron density falls off approximately exponentially with the distance from the nucleus (the exact wave function for the hydrogen atom is an exponential function). There is therefore some justification for choosing the repulsive part as an exponential function (Buckingham or Hill type potentials and morse potential).

But these type of functions that fit into the distribution of the plot of change of van der Waals energy with respect to change in distance better then Lennard-Jones potential require more input parameters and their calculation is much more computationally expensive, besides van der Waals energy is most expensive from computational standpoint to be calculated in (MD) simulations.

In Lennard-Jones potential we have two input parameters that differ for different types of atom pairs. Van der Waals minimum distance is taken as the sum of two van der Waals radii corresponding to atoms in interacting pair, and interaction parameter as the geometrical mean of the atomic “softness” constants.

=

+

=

So for each atom type there are two parameters to be determined, the van der Waals radius and atomic softness, and

.

It should be noted that since the van der

(33)

26

Waals energy is calculated between pairs of atoms, but parameterized against experimental data, the derived parameters represent an effective pair potential, which at least partly includes many-body contributions.

3.4.2.2 The electrostatic energy: charges and dipoles

The other part of the non-bonded interaction is due to internal (re)distribution of the electrons, creating positive and negative parts of the molecule. A carbonyl group, for example, has negatively charged oxygen and a positively charged carbon. At the lowest approximation, this can be modeled by assigning (partial) charges to each atom. Alternatively, the bond may be assigned a bond dipole moment. These two descriptions give similar (but not identical) results. Only in the long distance limit of interaction between such molecules do the two descriptions give identical results.

The interaction between point charges is given by the Coulomb potential, with being a dielectric constant.

The atomic charges can be assigned by empirical rules, but are more commonly assigned by fitting to the electrostatic potential calculated by electronic structure methods, as it will be discussed afterwards. Since hydrogen bonding is to a large extent due to attraction between the electron-deficient hydrogen and an electronegative atom such as oxygen or nitrogen, a proper choice of partial charges may adequately model this interaction.

Some force fields use a bond dipole description for Uel. The interaction between two dipoles is given by:

(34)

27

Here and

are dipole moments, is angle between vectors of the dipole moments and and angles between centers of the mass of the diples

.

When properly parameterized, there is little difference in the performance of the two ways of representing Uel.

The “effective” dielectric constant

can be included to model the effect of surrounding molecules (solvent) and the fact that interactions between distant sites may be “through” part of the same molecule, i.e. a polarization effect. A value of 1 for

corresponds to a vacuum, while a large

reduces the importance of long-range charge-charge interactions. Typically, a value between 1 and 4 is used, although there is little theoretical justification for any specific value. In some applications the dielectric constant is made distance dependent

(

e.g.

=

,

changing the Coulomb interaction to

)

to model the “screening” by solvent molecules. There is little theoretical justification for this, but it increases the efficiency of the calculation as a square root operation is avoided and it seems to provide reasonable results.

It is clear that two atoms directly bonded should not have an Evdw or Eel term – their interaction is described by Estr. It is also clear that the interaction between two hydrogens at each end of say CH3(CH2)50CH3 is identical to the interaction between two hydrogens belonging to two different molecules, and they therefore should have an Evdw and an Eel term. Most force fields include Evdw and Eel for atom pairs that are separated by three bonds or more, although 1,4-interactions are in many cases scaled down by a factor between 1 and 2. This means that the rotational profile for an A-B-C-D sequence is determined both by Etors and Evdw and Eel terms for the A-D pair. In a sense, Etors may be considered as a correction necessary for obtaining the correct rotational profile once the non-bonded contribution has been accounted for. Some force fields have chosen also to include Evdw for atoms that are 1,3 with respect to each other – these are called Urey-Bradley force fields. In this case, the energy required to bend a three atom sequence is a mixture of Ebend and Evdw. Most modern force fields calculate Estr between all atoms pairs that are 1,2 with respect to each other in terms of bonding, Ebend for all pairs that

(35)

28

are 1,3, Etors between all pairs that are 1,4, and Evdw and Eel between all pairs that are 1,4 or higher.

For polar molecules, the electrostatic energy dominates the force field energy function, and an accurate representation is therefore important for obtaining good results. The methods for obtaining partial charges will be discussed in following sections.

3.4.3 Bonded Interactions

As mentioned above we have three types of bonded interactions that are considered as independent uncoupled energetic entities in (MD) methodology, those are stretching, bending and torsional potential energies. In the following section we will discuss their formulation and demonstrate the necessity of deriving parameters for these types of interactions.

3.4.3.1. The Stretch Energy

Estr is the energy function for stretching a bond between two atom types A and B. In its simplest form (most common approximation over equilibrium bond length), it is written as a Taylor expansion around a “natural”, or “equilibrium”, bond length, R0. Terminating the expansion at second order gives:

Ustr(RAB) =

U

( ) +

(

R AB _-

_{) +}

_...

The derivatives are evaluated at R=R0 and the

U

( ) term is normally set to zero, since this is just the zero point for the energy scale. The second term is zero as the expansion is around the equilibrium value. In its simplest form the stretching energy can thus be written as:

Ustr(RAB) = KAB

=

KAB

Here KAB is the “force constant” for the A-B bond. This is the form of a harmonic oscillator, with the potential being quadratic in the displacement from the minimum.

(36)

29

The harmonic form is the simplest possible and sufficient for determining most equilibrium geometries. There are certain strained and crowded systems where the results from a harmonic approximation are significantly different from experimental values, and if the force field should be able to reproduce features such as vibrational frequencies, the functional form for Ustr must be improved. The straightforward

approach is to include more terms in the Taylor expansion.

U

str

(

) =

+

This of course has a price: more parameters have to be assigned.

Polynomial expansions of the stretch energy do not have the correct limiting behavior. The cubic anharmonicity constant K3 is normally negative, and if the Taylor expansion is terminated at third order, the energy will go toward for long bond lengths. Minimization of the energy with such an expression can cause the molecule to fly apart if a poor starting geometry is chosen. The quartic constant K4 is normally positive and the energy will go toward for long bond lengths if the Taylor series is terminated at fourth order. The correct limiting behavior for a bond stretched to infinity is that the energy should converge towards the dissociation energy. A simple function that satisfies this criterion is the Morse Potential.

EMorse( ) = D

=

Here D is the dissociation energy and

is related to the force constant. The Morse function reproduces the actual behavior quite accurately over a wide range of distances. There are, however, some difficulties with the Morse potential in actual applications. For long bond lengths the restoring force is quite small. Distorted structures, which may either be a poor starting geometry or one that develops during a simulation, will therefore display a slow convergence towards the equilibrium bond length. For minimization purposes and simulations at ambient temperatures (e.g. 300 K) it is

(37)

30

sufficient that the potential is reasonably accurate up to 9.55 KJ/mol above the minimum (the average kinetic energy is 0.88 kJ/mol at 300 K). In this energy range there is little difference between a Morse potential and a Taylor expansion, and most force fields therefore employ a second harmonic term of simple polynomial for the stretch energy:

Ustr(RAB) = KAB

R0 is the parameter which, when used to calculate the minimum energy structure of the molecule will produce a geometry having the experimental equilibrium bond length. If there were only one stretch energy in the whole force field energy expression (i.e. a diatomic molecule), R0 would be the equilibrium bond length.

For each bond type, i.e. a bond between two atom types A and B, there are at least two parameters to be determined, and . The higher order expansions, and the Morse potential, have one additional parameter (

) that needs to be determined.

3.4.3.2 The Bending Energy

Ebend is the energy required for bending an angle formed by three atoms A – B – C where there is a bond between A and B, and between B and C. Similarly to Estr, Ebend is usually expanded as a Taylor series around a natural bond angle and terminated at second order, giving the harmonic approximation:

E

bend(

) =

While the simple harmonic expansion is adequate for most applications, there may be cases where higher accuracy is required. The next improvement is to include a third order term, analogous to Estr. This can give a very good description over a large range of angles.

For each combination of three atom types, A, B and C, there are at least two bending parameters to be determined, and

.

(38)

31

3.4.3.3 The Torsional Energy

Etors describes part of the energy change associated with rotation around a B – C bond in a four-atom sequence A – B – C – D, where A – B, B – C and C – D are bonded. Torsional angle is defined as the angle formed by the A – B and C – D bonds. The angle may be taken to be in range (00, 3600) or (-1800, 1800).

The torsional energy is fundamentally different from Estr and Ebend in three aspects:

1. A rotational barrier has contributions from both the non-bonded (van der Waals and electrostatic) terms, as well as the torsional energy, and torsional parameters are therefore intimately coupled to the non-bonded parameters.

2. The torsional energy function must be periodic in the angle, if the bond is rotated 3600 the energy should return to the same value.

3. The cost in energy for distorting a molecule by rotation around a bond is often low, i.e. large deviations from the minimum energy structure may occur, and a Taylor expansion for angle is therefore not a good idea.

To encompass the periodicity, Etors is written as a Fourier series:

E

_tors

( ) =

The n=1 term describes a rotation that is periodic by 3600, the n=2 term is periodic by 1800, the n=3 term is periodic by 1200, and so on. The Vn constants determine the size of the barrier for rotation around the B – C bond. Depending on the situation some of these Vn constants may be zero.

Formula used in the majority of Force fields has following expression:

(39)

32

3.5 Techniques used for (MD) simulations of metal centered systems

As we can see for performing (MD) simulations parameters for bonding, Van der Waals, electrostatic and etc. energetic terms for particular atom types should exist. For systems like metal centered complexes the variety of bond and angle types are sometimes so large and specific that they are only poorly included in biomolecular force field. On other hand (MD) simulation are extremely used for proteins and require such parameters, as almost half of the enzymes have metal atoms at the active site. And to describe conformational change of active site is most important comparing to rest of the protein body. For performing (MD) on metaloenzimes the three following (MD) techniques are typically used:

• The Non-bonded approach is the roughest approximation in treating metal atoms during (MD) simulations. It implies describing metals only from the standpoint of van der Waals and electrostatic energetic terms, omitting any bonded interaction, thus giving opportunity for bonds to break and form as the metal approaches or goes away from the bonded residue.

• The Cationic Dummy atom Approach is the second approximation of interactions. Within this model metal interaction are described by set of cationic dummy atoms arranged in a way around the metal to meet requirements of coordination chemistry of the center. Charge of metal atom is equally distributed between these dummy atoms but the mass and Van der Waals parameters are centered in the original metal atom.

• The bonded-model approach implies that we should have in possession all necessary parameters, bonding, electrostatic and etc. And we are able to conduct standard (MD) simulation without any further approximation at this level.

To be able to implement this third approach derivation of the corresponding parameters is necessary. That is the aim of the current work.

(40)

33

4. Quantum Mechanics

4.1. Basis set Chemistry

LCAO (Linear Combination of Atomic Orbitals) approximation is giving opportunity to find solution of the schrodingers equation as a combination of atomic orbitals. In other words molecular wave function can be expanded and represented as a combination of orthonormal basis functions describing in principle atomic orbitals. Basis set is set of such functions that comply with all properties of electronic wave functions. Various different basis functions are used in QM calculations, starting from minimal basis, and ending with basis functions that depending on the utilized QM method can converge to complete basis set. This gives us the most precise solution. Below are listed and described some of them:

STO-nG - is most common minimum basis set. n represents minimum number of Gaussian primitive basis functions comprising a single basis function.

X-YZg - Split-valence basis set, also known as Poples basis set. Here X represents the number of primitive Gaussian comprising each core atomic orbital basis functions each. Y and Z indicate that each valence orbital is composed of two basis functions respectatively by linear combination of Y and Z primitive Gaussian functions (split-valence double zeta). Split valence triple and quadruple zeta (X-YZWg, X-YZWVg) is also used.

cc-pVDZ – Correlation-consistent basis set. Systematically converging to complete- basis set with limit of using empirical extrapolation techniques.

SDD - an alternative basis set for the entire periodic table using effective core potentials (Pseudopotentials) to reduce the number of basis functions (for the core electrons) and to include relativistic effects (for heavy elements)

(41)

34

There are many other basis sets that are extensively in use but their consideration here will take us too far.

4.2 QM Methods and their preference for studying Metal Centered systems

Over the last decade, significant progress has been made in applying quantum mechanical methods to chemical problems involving the structures and reactions of molecules. Quantum mechanics (QM) states that the energy and other related properties may be calculated by solving the Schrödinger equation. However, the exact solutions to the Schrödinger equation are not computationally feasible for most molecules. Therefore, several families of electronic structure methods have been developed: semiempirical, ab initio, and density functional theory (DFT). The most common type of ab initio method is the Hartree-Fock (HF) method in which the primary approximation is the central field approximation. This approximation gives the average effect of the electron-electron repulsion instead of the explicit repulsion interaction. Due to the central field approximation, HF calculations do not include electron correlation and the energies resulting from these calculations are always greater than the exact energy. Transition metal systems typically require explicit electron correlation and therefore HF is typically not suitable for these systems. Since post-HF ab initio methods and DFT include electron correlation, these methods are more appropriate for transition metals. One of the post-HF methods that include electron correlation is multiconfiguration self-consistent field (MCSCF), which uses multiple determinants for describing the wave function. MCSCF calculations are employed for near degenerate states and open shell systems, and can be very accurate, but the cost in CPU time is very high. In an MCSCF calculation not only the coefficients of the multiple determinants, but also the orbitals are optimized. This type of calculation requires the specification of the active orbitals to be included in the active space. An MCSCF calculation in which all combinations of the active space orbitals are included is called a complete active space self-consistent field (CASSCF) calculation. CASSCF calculations give the maximum electron correlation in the valence region.

The DFT family of methods has become very popular over the last decade because they are less computationally expensive when compared to post-HF methods. In practice, a

(42)

35

DFT calculation involves a similar computational effort to that required for a HF calculation. The basis of DFT is that the energy of a molecule can be determined from the electron density instead of a wave function. The B3LYP hybrid functional is the most widely used functional due to the good results obtained for a large range of compounds. This hybrid functional involves the Becke three term exchange functional (three fitting parameters are used) coupled with Lee, Yang, and Parr correlation functional.

4.3 Restrained Electrostatic Potential (RESP)

Electrostatic energy dominates in the force field energy function in almost all force field, and an accurate representation is therefore important for obtaining good results. Within the partial charge model, the atomic charges are normally assigned by fitting to the molecular electrostatic potential (MEP) calculated by an electronic structure method. The electronic potential at a point r is given by the nuclear charges and electronic wave function as following:

The fitting is done by minimizing an error function of the following form, under the constraint that the sum of the partial charges Qi is equal to the total molecular charge.

The electrostatic potential is sampled at a few thousand points in the near vicinity of the molecule. The set of linear equations arising from minimizing the error function are often poorly conditioned so the calculated partial charges are sensitive to small details in the fitting data. The physical reason for this is that the electrostatic potential is primarily determined by the atoms near the surface of the molecule, while the atoms buried within the molecule have very little influence on the external electrostatic

(43)

36

potential. A straightforward fitting therefore often results in unrealistically small charges for the non-surface atoms. The problem can to some extend be avoided by adding a hyperbolic penalty term for having non-zero partial charges, since this ensures that only those charges that are important for the electrostatic potential have values significantly different from zero. This scheme is known as Restrained Electrostatic Potential (RESP) fitting that is currently considered as most precise and is used by majority of force fields.

(44)

37

5. Different approaches used during parameterization and our strategy

When deriving parameters for (MD) simulations different approaches are used. All of them have their positive and negative, goals and limitations. The main aim when deriving force constants for biomolecular force fields is to get second order derivative of the function of state with respect to the coordinate. The First order derivative refers to velocity and second order derivative refers to acceleration thus coefficient obtained as a result of derivation is force constant. The easiest way to obtain these coefficients is to get a second order derivative of the multidimensional energy function and extract constants from hessian matrix (diagonal values). This technique implies performance of vibration normal mode frequency analysis during QM calculations for all the system, so every normal mode that is left flexible to vibrate is coupled with each other and force constants that are obtained are highly coupled with each other. In practice coupling is only between some particular vibrational modes that are linked thorough sequence of a few bonds, with the rest of the system coupling being much weaker.

The second way of obtaining force constants is instead of getting it from Hessian, just obtaining a function of change of the energy with respect to the coordinates for a particular bond or angle under the interest. This is implemented by minimizing and maximizing distance of angle or bond of particular interest, calculating energies for every step of minimization and maximization, getting a distribution, plotting it with respect to the energy change and fitting distribution into function. This function for the majority of cases at least for near equilibrium values is a harmonic function. This procedure is called bond and angle scanning.

When scanning bonds and angles for getting force constants mainly all the rest of the system is kept restrained. Thus when performing Hessian calculations we had high coupling of all the degrees of freedom, in case of restraining the system we are having no coupling at all.

During our calculations for angle and bond force constants we are enabling flexibility of the residues and metal atom between each angle or bond constant is calculated. The rest of the residues as well as the backbone atoms of flexible residues are kept frozen. When minimizing and maximizing distance between angles and bonds before doing single point energy calculations we are pre-optimizing system thus coupling between appropriate degrees of freedom is implemented. Then from the harmonic part of the

(45)

38

distribution force constants are derived (Figure 5.1). Bond scan is implemented with 10 step increment and decrement with 0.04 Å for each step, angle scan is implemented with 12 steps each step with one 1o with some exceptions, described below.

After researching and defining a chemical structure for the system studied, as well as its physical parameters, as we did in Section 2, first step before the beginning of the scan procedure is to optimize the geometry of the system. We are performing this with following QM level of theory: DFT with B3LYP functional, SDD basis set on the metal atoms (it partially implies Poples basis set on valence orbitals of the metal atom when used in gaussian) and 6-31g(d,p) on non metal atoms. Same level of theory was used during the bond and angle scan (angle scan due to the time limitations was not performed on Ni-C). Charges were calculated with 311+G(3df, 3pd) and 6-311+G(2d,2p) basis sets separately for all 3 models, to compare their performance (Figure 5.1). Outcome is provided and compared in the end of Results section.

O ptimizat ion

B3 LYP/ SDD: 6-3 1g(d,p)

RESP

B3LYP/6 -311 +G(2 d, 2p) / 6-31 1+G(3 df, 2pd)

Bond a nd Angl e scan B3LYP/6-31G(d ,p)

Fitt in g di st rib utio n int o func tio n and deri ving force co nstan ts

Figure 5.1. Graphical representation of the different steps of performed procedure

During the optimization constrains were implied on backbone atoms of the NiFe hydrogenase but in case of F430 cofactor no constrains were implemented during the optimization. For two models of NiFe hydrogenase during the bond and angle scan all the system was frozen except the metal atom and residues participating in scanning, with exclusion of backbone atoms of these residues that were constrained during the scan as well. For F430 cofactor that has very rigid structure, with exception of directly bonded atoms only two neighboring to metal bonding atoms of cyclic groups were flexibilized, as depicted on figure 5.2 the rest of the system was frozen.

(46)

39

Figure 5.2. Representation of F430 cofactor with flexible atoms highlighted.

(47)

40

6. Results and Discussion

6.1 Bond and Angle scans for Ni-C and Ni-B

As a result of the optimization procedure slightly distorted coordination geometries for the both catalytic active Ni-C and oxidized inactive Ni-B states of the active center of NiFe hydrogenase were obtained. The geometries obtained are in good correlation with various literature resources suggesting a coordination chemistry of high spin Fe (Figures 6.1.1 and 6.1.2).

Figure 6.1.1 Different representation of Ni-B after the optimization