Friday, April 20, 2012

The Wonders of Numpy

I originally envisioned this post to be somewhat of a wall of code which would show just how ugly you can actually write code that works (see the public gist for that) but without the checks to see that it does what I want it to do. I've been parsing text lately - a weird and wonderful combination of taking the molecular orbital (MO) coefficients of a calculation and transforming them into a new basis set to calculate spin-spin coupling constants. This involves reading the input file to get the basis set exponents of the uncontracted basis set and matching them with the corresponding MO coefficients and generate a new segmented contracted basis set.

In the table below we have a segmented basis set where the primitive gaussian type orbitals (PGTO) are combined into contracted gaussian type orbitals (CGTO). In this case, we have a basis set with three degrees of freedom (three CGTOs) down from five degrees of freedom (five PGTOs). Dark fields in the table shows coefficients whose value are larger than zero. Bright fields are zero.


So the PGTOs are obtained from the .mol file (I'm using Dalton for this) using the script found in the gist since it is used anyways to generate the new basis set. The CGTOs are obtained from an RHF calculation (see the hf.dal file in the gist) using the script.

The problem with the design of a new basis set is to choose how many of the PGTOs to use in a CGTO, for instance, why choose three PGTOs in CGTO1? Since the number of PGTOs is not too overwhelming, I've gone ahead and made a general solution so I have the possibility to test all cases. The main problem then boiled down to: given an N x N matrix with zeros (notice that there are many zeros in this matrix), how do I overwrite subarrays (3 x 1 in this case) in Numpy so I can insert a specific number of coefficients to make CGTO1 from PGTO1 to PGTO3? and how to I make CGTO2 and CGTO3 couple with PGTO4 and PGTO5 using identity matrix inserted at the correct spot?

The trick is to make use of Numpys excellent (and very difficult) features known as boolean masking where a matrix filled with True or False will extract the corresponding values and one is able to assign to them. In the end, we need to remove excessive columns and I've found my head can do that easily by doing a transpose of the matrix, remove the unneeded rows (using a simpler Numpy/python logic which I know by hand) and transpose it back.

This was one of the more difficult issues, but at least I can now generate a multitude of basis set calculations and examining my property of interest in a matter of seconds. Numpy rocks, but I think I could grow old trying to learn all the awesome features it has.