Wednesday, November 30, 2011

Cut'n'Frag: Fragment smart using SMARTS

In our research group, part of the research we do is method development. To facilitate these new methods, tools are often required to setup various calculations or treat output files. Some of this research[1] is in fragment based methods[2] where a large system is divided into several smaller pieces and the total property of the whole system can be assembled from the individual fragments.

To help facilitate easier setup of fragment based calculations, we've made a tool (still under development) called FragIt which we use to fragment large molecules and prepare input files for GAMESS[3]. This post is about one key aspect of FragIt: How to use Open Babel[4] to figure out where to cut bonds in a protein.

For starters, one needs to load a molecule from a file, this is accomplished by the following lines of code[*]

import openbabel
obmol = openbabel.OBMol()
obconv = openbabel.OBConversion()
obconv.ReadFile(obmol, filename)

which basically tells OpenBabel that we like to open a file with a fileformat specified by the extension of the filename("pdb").

Open Babel includes functionality to do SMARTS[5] pattern matching[6]. Basically, you write a pattern to search for a substructure in a molecule. The way we want to use it is to identify an atom A on one side of a bond, and an atom B on the other side of that bond. Thus, using the following SMARTS pattern on a protein


we can search for A: 'find carbon connected to nitrogen on one side' and B: 'find carbon connected to an oxygen on the other side'. Results are only returned if the above pattern mathes on both sides of a bond at the same time. The code to do this is

matches = [m for m in obpat.GetUMapList()]

where the last line converts some horrible <openbabel.vectorvInt; proxy ... > SWIG code into a list of tuples with matching atom ID's, i.e.

print matches


[(2, 3), (11, 12), (32, 33), (44, 45), (58, 59), (73, 74), (87, 88), (94, 95), (108, 109), (132, 133)]

Each tuple has information about the found atoms as (A, B).

The real beauty of this is of course that you can make your own patterns and match whatever you want in whichever molecule you chose to play with.

You can find this example as well as others on my github page.

[*] We opted for the full fledged API of OpenBabel, thus it takes 5 lines of code to actually open a file. The more pythonic library pybel[7] which is also included in OpenBabel was not used.

update: Thanks to Anders Christensen for pointing me towards an easy syntax highlighter.


  1. Hi Casper,
    Sorry to bother. I notice that you use effective fragment molecular orbital method a lot. Therefore, I really appreciate if you can comment my questions about effective fragment potential. I always want to use effective fragment potential to perform some simple MD simulations. As you know, a big plus of EFP is that we do not to spend a long time on parametrization in order to get a reasonable good force field. GAMESS do support MD simulations and MC simulations with EFP. I tried several times to use EFP (fragments only) for MD and MC simulations with periodic boundary conditions, following the suggestions in the manual of GAMESS. But neither works. I always got problems such as "EQLRAT has failed to converge". Probably you have met similar problem when using FMO method?

  2. Hi lewispenn,

    I've not had such issues with the FMO code. I must admit that I've yet to run an MD (or MC for that matter) calculation in GAMESS.

    A quick look in the source code gave me nothing except that EQLRAT is part of another subroutine to locate eigenvalues and eigenvectors.

    did you try and ask in the GAMESS google group (!forum/gamess) ?