Python Performance Tips

This page is devoted to various tips and tricks that help improve the performance of your Python programs. Wherever the information comes from someone else, I've tried to identify the source.

If you have any light to shed on this subject, let me know.

Contents

Profiling Code

The first step to speeding up your program is learning where the bottlenecks lie. It hardly makes sense to optimize code that is never executed or that already runs fast. I use two modules to help locate the hotspots in my code, profile and trace.

Profile Module

The profile module is included as a standard module in the Python distribution. Using it to profile the execution of a set of functions is quite easy. Suppose your main function is called main, takes no arguments and you want to execute it under the control of the profile module. In its simplest form you just execute

import profile
profile.run('main()')
When main() returns, the profile module will print a table of function calls and execution times. The output can be tweeked using the Stats class included with the module. For more details, check out the profile module's documentation (Lib/profile.doc).

Trace Module

The trace module is a spin-off of the profile module I wrote originally to perform some crude statement level test coverage. You use it in pretty much the same fashion as profile, however the result is an annotated listing of the various Python source files that were accessed during the run.

import trace
trace.Coverage().run('main()')
There's no documentation. You just have to browse the code.

Sorting

From Guido van Rossum

Sorting lists of basic Python objects is generally pretty efficient. The sort method for lists takes an optional comparison function as an argument that can be used to change the sorting behavior. This is quite convenient, though it can really slow down your sorts.

An alternative way to speed up sorts is to construct a list of tuples whose first element is a sort key that will sort properly using the default comparison, and whose second element is the original list element.

Suppose, for example, you have a list of tuples that you want to sort by the n-th field of each tuple. The following function will do that.

def sortby(list, n):
    nlist = map(lambda x, n=n: (x[n], x), list)
    nlist.sort()
    return map(lambda (key, x): x, nlist)
Here's an example use:
>>> list = [(1, 2, 'def'), (2, -4, 'ghi'), (3, 6, 'abc')]
>>> list.sort()
>>> list 
[(1, 2, 'def'), (2, -4, 'ghi'), (3, 6, 'abc')]
>>> sortby(list, 2)
[(3, 6, 'abc'), (1, 2, 'def'), (2, -4, 'ghi')]
>>> sortby(list, 1) 
[(2, -4, 'ghi'), (1, 2, 'def'), (3, 6, 'abc')]

String Concatenation

Strings in Python are immutable. This fact frequently sneaks up and bites novice Python programmers on the rump. Immutability confers some advantages and disadvantages. In the plus column, strings can be used a keys in dictionaries and they can be shared. (Python shares one- and two-character strings.) In the minus column, you can't say something like, "change all the 'a's to 'b's" in any given string. Instead, you have to create a new string with the desired properties. This continual copying can lead to significant inefficiencies in Python programs.

From Aaron Watters

Please mention string.split and string.join and the sprintf like string substitution features -- proper use of these push a lot of work to C-speed subroutines that also make programs cleaner.

Avoid this:

   str = ""
   for substring in list:
       str = str + substring

Use string.join(list,""). The former is a very common and catastrophic mistake when building large strings. Similar errors are easy to do using reduce to build large structures.

Similarly, if you are generating bits of a string sequentially instead of

   str = ""
   for x list:
       str = str + some_function(x)
use
   str = [None]*len(list):
   for i in range(len(list):
       str[i] = some_function(list[i])
   str = string.joinfields(str, "")

Avoid:

   out = "" + head + prologue + query + tail + "
Instead, use
   out = "%s%s%s%s" % (head, prologue, query, tail)

This is a lot faster, and easier to modify also. The former recopies what might be big strings a lot. The latter copies them only once.

Loops

Python supports a couple of looping constructs. The for statement is most commonly used. It loops over the elements of a sequence, assigning each to the loop variable. If the body of your loop is simple, the interpreter overhead of the for loop itself can be a substantial amount of the overhead. This is where the map function is handy. You can think of map as a for moved into C code. The only restriction is that the "loop body" of map must be a function call.

Here's a straightforward example. Instead of looping over a list of words and converting them to upper case:

import string
newlist = []
for word in list:
    newlist.append(string.upper(word))
you can use map to push the loop from the interpreter into compiled C code:
import string
newlist = map(string.upper, list)

Guido van Rossum wrote a much more detailed examination of loop optimization that is definitely worth reading.

Avoiding dots...

Suppose you can't use map? The example above of converting words in a list to upper case has another inefficiency. Both newlist.append and string.upper are function references that are recalculated each time through the loop. The original loop can be replaced with:
import string
upper = string.upper
newlist = []
append = newlist.append
for word in list:
    append(upper(word))

Local Variables

The final speedup available to us for the non-map version of the for loop is to use local variables wherever possible. If the above loop is cast as a function, append and upper become local variables.

def func():
    upper = string.upper
    newlist = []
    append = newlist.append
    for word in words:
	append(upper(word))
    return newlist
An extra performance boost is received because local variables are accessed more efficiently than variables at module scope. On my machine (100MHz Pentium running BSDI), I got the following times for converting the list of words in /usr/share/dict/words (38,470 words) to upper case:

Version Time (seconds)
Basic loop3.47
Eliminate dots2.45
Local variable & no dots1.79
Using map function0.54

Eliminating the loop overhead by using map is often going to be the most efficient option. When the complexity of your loop precludes its use other techniques are available to speed up your loops, however.

Initializing Dictionary Elements

Suppose you are building a dictionary of word frequencies and you've already broken your words up into a list. You might execute something like:

wdict = {}
for word in words:
    if not wdict.has_key(word): wdict[word] = 0
    wdict[word] = wdict[word] + 1

Except for the first time, each time a word is seen the if statement's test fails. If you are counting a large number of words, many will probably occur multiple times. In a situation where the initialization of a value is only going to occur once and the augmentation of that value will occur many times it is cheaper to use a try statement:

wdict = {}
for word in words:
    try:
        wdict[word] = wdict[word] + 1
    except KeyError:
        wdict[word] = 1

It's important to catch the expected exception, and not have a default except clause to avoid trying to recover from an exception you really can't handle by the statement(s) in the try clause.

Note that if the try clause generates an exception most of the time, it will often be more efficient to test for the exceptional condition than to generate it and recover from it.

Import Statement Overhead

import statements can be executed just about anywhere. It's often useful to place them inside functions to restrict their visibility and/or reduce initial startup time. Although Python's interpreter is optimized to not import the same module multiple times, repeatedly executing an import statement can seriously affect performance in some circumstances.

Consider the following two snippets of code (originally from Greg McFarlane, I believe - I found it unattributed in a comp.lang.python/python-list@python.org posting and later attributed to him in another source):

def doit():
    import string             ###### import statement inside function
    string.lower('Python')

for num in range(100000):
    doit()
or:
import string             ###### import statement outside function
def doit():
    string.lower('Python')

for num in range(100000):
    doit()
The second version will run substantially faster than the first, even though the reference to the string module is global in the second example.

This example is obviously a bit contrived, but the general principle holds.

Using map with Dictionaries

I found it frustrating that to use map to eliminate simple for loops like:

dict = {}
nil = []
for s in list:
    dict[s] = nil
I had to use a lambda form or define a named function that would probably negate any speedup I was getting by using map in the first place. I decided I needed some functions to allow me to set, get or delete dictionary keys and values en masse. I proposed a change to Python's dictionary object and used it for awhile. However, a more general solution appears in the form of the operator module in Python 1.4. Suppose you have a list and you want to eliminate its duplicates. Instead of the code above, you can execute:
dict = {}
map(operator.setitem, [dict]*len(list), list, [])
list = statedict.keys()
This moves the for loop into C where it executes much faster.

Data Aggregation

(Paraphrased from a note by Aaron Watters)

Function call overhead in Python is relatively high, especially compared with the execution speed of a builtin function. This strongly suggests that extension module functions should handle aggregates of data where possible. Here's a contrived example written in Python. (Just pretend the function was written in C. :-)

x = 0
def doit(i):
    global x
    x = x + i

list = range(10000)
for i in list:
    doit(i)
vs.
x = 0
def doit(list):
    global x
    for i in list:
	x = x + i

list = range(10000)
doit(list)
Even written in Python, the second example runs about four times faster than the first. Had doit been written in C the difference would likely have been even greater (exchanging a Python for loop for a C for loop as well as removing most of the function calls).

Doing Stuff Less Often

The Python interpreter performs some periodic checks. In particular, it decides whether or not to let another thread run and whether or not to run a pending call (typically a call established by a signal handler). Most of the time there's nothing to do, so performing these checks each pass around the interpreter loop can slow things down. There is a function in the sys module, setcheckinterval, which you can call to tell the interpreter how often to perform these periodic checks. In Python 1.4 it defaults to 10. If you aren't running with threads and you don't expect to be catching lots of signals, setting this to a larger value can improve the interpreter's performance, sometimes substantially.


Skip Montanaro
(skip@mojam.com)

Last modified: Tue Nov 2 10:40:34 CST 1999