Wednesday, January 18, 2012

Adding Hydrogen Atoms Is No Mean Feat

I've had a problem since Monday. Hydrogen atoms. You take an MD from say GROMACS or Tinker and make a 16 Angstrom extract of the entire system (this is Chorismate Mutase, PDB: 2CHT) in pymol via the command

cmd.select("sele", "(byres (sele expand 16))",enable=1)

so you can run QM/MM on it afterwards (see picture of the actual cut-out or look at my github page of the pdb-file with missing hydrogens).


However, doing the extract (which cuts the peptide-bond) improper chemistry appears around this bond since atoms are missing - on one side (where you remove atoms) it does not matter since you remove it all together, but on the side that remains, this is a big problem since the chemical valence is not correct. I want to add hydrogens where atoms are missing, but no standard solution was apparently available to add hydrogens to the bonds that were cut. Trying in pymol added hydrogens all over the place and the usual Open Babel tools also made funny business with the termini in its clever algorithms for proteins. In the end, it was time to revisit the good old trusty Open Babel and scour the API to actually see if something was usable. Again, I struck gold and here is what I did to actually make it work.


Chemically speaking, the local bonding environment is not satisfied at the points where cleavage took place. Open Babel has functionality to get the assumed valence of an atom as well as the actual valence (based on connectivity) - this we will use to our advantage and make sure it only applies to backbone carbons (not $C_{\alpha}$) and not to Oxygen.

For convenience I'm looping over residues and then atoms in residues and then seeing if the assumed and actual valencies are matching. This is not nescessary as I do not use any information about the residues (yet). If the valencies are not matching I add hydrogen(s) to the atom in question to satisfy the valency.

The following code does this

import sys
import openbabel
from obutil import *

if __name__ == "__main__":
  if( len(sys.argv) != 3 ):
    print "Usage: protonate in.pdb out.pdb"
    sys.exit()
  file_in = sys.argv[1]
  file_out = sys.argv[2]

  mol = OBMolFromFilename(file_in)
  for residue in openbabel.OBResidueIter(mol):
    rname = residue.GetName()
    chain = residue.GetChain()
    for atom in openbabel.OBResidueAtomIter(residue):
      if atom.IsCarbon() or atom.IsNitrogen():
        idx = atom.GetIdx()
        imval = atom.GetImplicitValence()
        reval = atom.GetValence()
        if imval != reval:
          print "%5i (%3s %s) %2i %2i" % (idx,rname,chain,imval,reval)
          mol.AddHydrogens(atom)

  OBMolToFilename(mol, file_out)

Notice it uses a new obutil.py module I've made with some convenience functions to easily access features of Open Babel. By using the code on the supplied model1.pdb file, one ends up with

    1 (VAL A)  3  2
    3 (VAL A)  3  2
   17 (ASP A)  3  2
  239 (LEU A)  3  2
  256 (PRO A)  3  2
  258 (PRO A)  3  2
  270 (VAL A)  3  2
  397 (THR A)  3  2
  409 (LEU A)  3  2
  808 (GLU A)  3  2
  821 (CYS A)  3  2
  910 (MET A)  3  2
  925 (GLY C)  3  2
 1008 (THR C)  3  2
 1020 (LEU C)  3  2
 1022 (LEU C)  3  2
 1039 (ILE C)  3  2
 1041 (ILE C)  3  2
 1058 (LEU C)  3  2
 1159 (LEU C)  3  2
 1176 (MET C)  3  2
 1450 (MET C)  3  2
 1465 (GLN C)  3  2
 1467 (GLN C)  3  2
 1482 (ILE C)  3  2

and there you have it: A protonated cutout from an MD simulation which does not have reprotonated atoms.

NB! There is one issue with this approach which concerns the actual output to PDB. There is no reordering of the atoms when you save the file, so if you depend on the order of the atoms, then this approach needs some more work. A colleague (and I) have yet to figure out a clever way to actually do this.