Eliminating for Loops Python Analysis Code

That moment when you really see why for-loops are bad.


Alejandro Soto


February 6, 2022


Assumed Audience: Scientists and/or programmers who work with NumPy and Python. All others are also welcome!

Using Python empowered my scientific research. Since Python is free and open source, I avoid depending on my employer for my research tools. With a large, active community, I have many resources available when I need to learn new Python programming or I need to improve my existing Python code. However, I have struggled with some of the Python, NumPy, and Matplotlib etc.) concepts that make the language so powerful, including list comprehensions, broadcasting, and other tools/techniques to optimize the code execution. Thus, I was happy to find a clear sign of why I need to invest in learning this aspect of NumPy and Python.

When working with output from the planetWRF general circulation model, I need to deal with the Arakawa C-grid, which means that in the output data the horizontal velocity outputs/quantities are staggered with respect to the other quantities (e.g., temperature, pressure, etc.). The model also outputs two vertical dimensions, whose points on the grid are staggered from each other. All of this staggering of some of the quantities is great if you want to calculate fluxes, but if you want to know the wind speed at the same point as the temperature, then you will need to interpolate from the staggered grid to the unstaggered grid.

In the past, I would use for-loops to interpolate from a staggered to unstaggered grid. And since I was working with 4-dimentional data (time, latitude, longitude, and height), I would use two for-loops, nested. Here’s how code looked like:

vstag = ncdata.variables["V"][:]
(ntimes,zi,yi_s,xi) = np.shape(vstag)                              
v = np.zeros((ntimes,zi,yi_s-1,xi),'float32') 
for ti in range(ntimes):
    for j in range(yi_s - 1):
        v[ti,:,j,:] = 0.5*(vstag[ti,:,j,:] + vstag[ti,:,j+1,:])

Let’s ignore most of the details here and focus on the two for-loops, one iterated over ti (because of for ti in range(ntimes)) and the other iterated over j, which is range(yi_s - 1). As we shall see, these two for-loops are computationally expensive.

I found the for-loops easy to implement because I could visualize the process. For-loops are sort of a brute force process, at least in my mind. I can imagine calculating the resulting array myself, by stepping through each iteration in the loop. It makes sense to me.

As I started to understand how to write this code more pythonically, however, I began to see that I need to change how I view this code. Instead of imagining how I would do the calculation by hand, I needed to see the code more mathematically. In the velocity equation in the example above, v[ti,:,j,:] = 0.5*(vstag[ti,:,j,:] + vstag[ti,:,j+1,:]), I need to relate the j and j+1 to the NumPy array notation. Normally, if you want all of the elements in one dimension of an array, you just use the notation :, but what this notation implicitly says is 0:n where n is the size of the dimension1. When I am writing the for-loop, I am coding this mathematical equation:

\(v = \frac{1}{2}( v_{stag, n-1} + v_{stag, n-1})\)

When \(n=1\) then \(n-1=0\). Basically, the first instance of \(v_{stag}\) is for \(n = 0,1,2,...,n-1\) and the second instance is for \(n = 1,2,3,...,n\). If I translate this thinking to the Python code, assuming just one dimension for now, then I have v = 0.5*(vstag[0:n-1] + vstag[1:n]. If we then simplify this to actual NumPy/Python notation, then we have ’v = 0.5*(vstag[:-1] + vstag[1:]. This is what I can use to replace the inner for loop. Now my code looks like:

vstag = ncdata.variables["V"][:]
(ntimes,zi,yi_s,xi) = np.shape(vstag)                              
v = np.zeros((ntimes,zi,yi_s-1,xi),'float32') 
for ti in range(ntimes):
    v[ti,:,:] = 0.5*(vstag[ti,:,:-1,:] + vstag[ti,:,1:,:])

Removing the remaining for-loop is easy. Thanks to broadcasting in NumPy I don’t need that loop at all. All I have to do is replace the variable ti with another colon. The NumPy/Python system will take care of the rest. Now my code looks like:

vstag = ncdata.variables["V"][:]
(ntimes,zi,yi_s,xi) = np.shape(vstag)                              
v = np.zeros((ntimes,zi,yi_s-1,xi),'float32') 
v[:,:,:] = 0.5*(vstag[:,:,:-1,:] + vstag[:,:,1:,:])

Let’s see how the three methods compare in a speed test. I coded each version in a Jupyter notebook, running it from within VS Code. For the second and third methods, I renamed the velocity to v2 and v3, respectively. Thanks to VS Code’s timing banner in each Jupyter cell, when I ran the codes VS Code automatically provided the execution/computation time. In the figure below, you can see how each of the methods performed, with the execution time showing in the bottom left of the cell.

A screenshot showing how replacing for-loops with broadcast speeds up the calculations, at least in Python

And here’s that moment when I really saw why for-loops are bad. The code with two for-loops took six and a half minutes to execute. Remove just one for-loop, the main one, and the execution time is just 11 seconds. That’s 35 times faster! Replacing both for-loops increases the execution speed by 780 times (almost 3 orders of magnitude), when compared to the original code.

This may seem obvious to more veteran Python programmers, or just better Python programmers. That’s okay. I am writing this for me, to crystallize what I am learning here and so that this documentation will help future Alejandro avoid the mistakes of past Alejandro.


  1. In this notation, I am ignoring the fact that Python ranges are written with the last number being excluded. Thus, I should really write “0:n-1 where n-1 is the size of the dimension”, but that would get confusing with the rest of the discussion. Still, I think the explanation works for this topic.↩︎