How to use cProfile to profile Python code

Why is your Python app so slow? Find out by using Python’s built-in profiler to locate bottlenecks in your Python code

How to use cProfile to profile Python code
Thinkstock

Python may not be the fastest language around, but it is often fast enough. And Python is ideal when programmer time matters more than CPU time.

That said, if a given Python app is laggy, you’re not obliged to just suck it up. Tools included with the stock installation of the Python interpreter can provide you with detailed feedback about which parts of your program are slow, and offer some hints about how to speed them up.

How to use cProfile

The cProfile module gathers statistics about the execution time of a Python program. It can report on anything from the entire app to a single statement or expression.

Here is a toy example of how to use cProfile:

def add(x,y):
    x+=str(y)
    return x

def add_2(x,y):
    if y % 20000 == 0:
        z=[]
        for q in range(0,400000):
            z.append(q)

def main():
    a=[]
    for n in range(0,200000):
        add(a,n)
        add_2(a,n)
        
if __name__ == '__main__':
    import cProfile
    cProfile.run('main()')

This example runs the application’s main() function and analyzes the performance of main() and everything main() calls. It is also possible to analyze only a part of a program, but the most common use for starters is to profile the whole program.

Run the above example and you will be greeted with something like the following output:

cprofile 01 IDG

What is shown here is a list of all of the function calls made by the program, along with statistics about each:

  • At the top (first line in blue), we see the total number of calls made in the profiled program and the total execution time. You may also see a figure for “primitive calls,” meaning non-recursive calls, or calls made directly to a function that don’t in turn call themselves further down in the call stack.
  • ncalls: Number of calls made. If you see two numbers separated by a slash, the second number is the number of primitive calls for that function.
  • tottime: Total time spent in the function, not including calls to other functions.
  • percall: Average time per call for tottime, derived by taking tottime and dividing it by ncalls.
  • cumtime: Total time spent in the function, including calls to other functions.
  • percall (#2): Average time per call for cumtime (cumtime divided by ncalls).
  • filename:lineno: The file name, line number, and function name for the call in question.

How to modify cProfile reports

By default, cProfile sorts its output by “standard name,” meaning that it sorts by the text in the far-right column (filename, line number, etc.).

The default format is useful if you want a general, top-down report of every single function call for reference. But if you’re trying to get to the bottom of a bottleneck, you will likely want the most time-consuming parts of the program listed first.

We can produce these results by invoking cProfile a little differently. Note how the bottom part of the above program can be reworked to sort the statistics by a different column (in this case ncalls):

if __name__ == '__main__':
    import cProfile, pstats
    profiler = cProfile.Profile()
    profiler.enable()
    main()
    profiler.disable()
    stats = pstats.Stats(profiler).sort_stats('ncalls')
    stats.print_stats()

The results will look something like this:

cprofile 02 IDG

Here is how all this works:

  • Instead of executing a command by way of cProfile.run(), which is not very flexible, we create a profiling object, profiler.
  • When we want to profile some action, we first call .enable() on the profiler object instance, then run the action, then call .disable(). (This is one way to profile only part of a program.)
  • The pstats module is used to manipulate the results collected by the profiler object and print those results.

Combining a profiler object and pstats allows us to manipulate the captured profile data — for instance, to sort the generated statistics differently. In this example, using .sort_stats('ncalls') sorts the statistics by the ncalls column. Other sort options are available.

How to use cProfile results for optimization

The sort options available for cProfile output allow us to tease out potential performance bottlenecks in a program.

ncalls

The first and most significant piece of information you can unearth with cProfile is which functions get called most frequently, by way of the ncalls column.

In Python, the mere act of making a function call incurs a relatively large amount of overhead. If some function is called repeatedly in a tight loop, even if it isn’t a long-running function, that’s guaranteed to impact performance.

In the above example, the function add (and the function add_2) is called repeatedly in a loop. Moving the loop into the add function itself, or inlining the add function entirely, would fix this problem.

tottime

Another useful statistic details which functions the program spends most of its time executing, by way of the tottime column.

In the above example, the add_2 function uses a loop to simulate some expensive computation, which pushes its tottime score to the top. Any function with a high tottime score deserves a close look, especially if it is called many times or in a tight loop.

Note that you always need to consider the context in which the function is used. If a function has a high tottime but is only called once — for instance, only when the program starts — it is less likely to be a bottleneck. However, if you’re trying to reduce startup time, you’ll want to know whether a function called at startup is making everything else wait.

How to export cProfile data

If you want to use cProfile's generated statistics in more advanced ways, you can export them to a data file:

stats = pstats.Stats(profiler)
stats.dump_stats('/path/to/stats_file.dat')

This file can be read back in by using the pstats module, then sorted or displayed with pstats. The data can also be re-used by other programs. Two examples:

  • pyprof2calltree renders detailed visualizations of the program’s call graph and usage stats from profile data. This article provides a detailed real-world example of its use.
  • snakeviz also generates visualizations from cProfile data, but uses a different representation for the data — a “sunburst” rather than pyprof2calltree’s “flame” graph.

Beyond cProfile for Python profiling

cProfile is hardly the only way to profile a Python application. cProfile is certainly one of the most convenient ways, given that it’s bundled with Python. But others deserve attention.

One project, py-spy, builds a profile for a Python application by sampling its call activity. py-spy can be used to examine a running Python app without having to stop and restart it, and without having to alter its codebase, so it can be used to profile deployed applications. py-spy also generates some statistics about the overhead incurred by the Python runtime (for instance, garbage collection overhead), which cProfile does not.

Copyright © 2019 IDG Communications, Inc.

How to choose a low-code development platform