3 steps to a Python async overhaul

How to speed up an existing Python program by reworking it to run concurrently using async

Senior Writer, InfoWorld |

3 steps to a Python async overhaul — Thinkstock

Table of Contents

When to use async in Python
Step 1: Identify the synchronous and asynchronous parts of your program
Step 2: Convert appropriate sync functions to async functions
Step 3: Test your Python async program thoroughly

Python is one of many languges that support some way to write asynchronous programs — programs that switch freely among multiple tasks, all running at once, so that no one task holds up the progress of the others.

Chances are, though, you’ve mainly written synchronous Python programs — programs that do only one thing at a time, waiting for each task to finish before starting another. Moving to async can be jarring, as it requires learning not only new syntax, but also new ways of thinking about one’s code.

In this article, we’ll explore how an existing, synchronous program can be turned into an asynchronous one. This involves more than just decorating functions with async syntax; it also requires thinking differently about how our program runs, and deciding whether async is even a good metaphor for what it does.

[ Also on InfoWorld: Learn Python tips and tricks from Serdar Yegulalp’s Smart Python videos ]

When to use async in Python

A Python program is best suited for async when it has the following characteristics:

It’s trying to do something that is mostly bound by I/O or by waiting for some external process to complete, like a long-running network read.
It’s trying to do one or more of those kinds of tasks at once, while possibly also handling user interactions.
The tasks in question are not computationally heavy.

A Python program that uses threading is typically a good candidate for using async. Threads in Python are cooperative; they yield to one another as needed. Async tasks in Python work the same way. Plus, async offers certain advantages over threads:

The async/await syntax makes it easy to identify the asynchronous parts of your program. By contrast, it’s often hard to tell at a glance what parts of an app run in a thread.
Because async tasks share the same thread, any data they access is managed automatically by the GIL (Python’s native mechanism for synchronizing access to objects). Threads often require complex mechanisms for synchronization.
Async tasks are easier to manage and cancel than threads.

Using async is not recommended if your Python program has these characteristics:

The tasks have a high computational cost — e.g., they’re doing heavy number-crunching. Heavy computational work is best handled with multiprocessing, which allows you to devote an entire hardware thread to each task.
The tasks don’t benefit from being interleaved. If each task depends on the last, there is no point to making them run asynchronously. That said, if the program involves sets of serial tasks, you could run each set asynchronously.

Step 1: Identify the synchronous and asynchronous parts of your program

Python async code has to be launched by, and managed by, the synchronous parts of your Python application. To that end, your first task when converting a program to async is to draw a line between the sync and async parts of your code.

In our previous article on async, we used a web scraper app as a simple example. The async parts of the code are the routines that open the network connections and read from the site — everything that you want to interleave. But the part of the program that kicks all that off isn’t async; it launches the async tasks and then closes them off gracefully as they finish up.

It’s also important to separate out any potentially blocking operation from async, and keep it in the sync part of your app. Reading user input from the console, for instance, blocks everything including the async event loop. Therefore, you want to handle user input either before you launch async tasks or after you finish them. (It is possible to handle user input asynchronously via multiprocessing or threading, but that’s an advanced exercise we won’t get into here.)

Some examples of blocking operations:

Console input (as we just described).
Tasks involving heavy CPU utilization.
Using time.sleep to force a pause. Note that you can sleep inside an async function by using asyncio.sleep as a substitute for time.sleep.

Step 2: Convert appropriate sync functions to async functions

Once you know which parts of your program will run asynchronously, you can partition them off into functions (if you haven’t already) and turn them into async functions with the async keyword. You’ll then need to add code to the synchronous part of your application to run the async code and gather results from it if needed.

Note: You’ll want to check the call chain of each function you’ve made asynchronous, and make sure they’re not invoking a potentially long-running or blocking operation. Async functions can directly call sync functions, and if that sync function blocks, then so does the async function calling it.

Let’s look at a simplified example of how a sync-to-async conversion might work. Here is our “before” program:

def a_function():
    # some async-compatible action that takes a while

def another_function():
    # some sync function, but not a blocking one

def do_stuff():
    a_function()
    another_function()

def main():
    for _ in range(3):
        do_stuff()

main()

If we want three instances of do_stuff to run as async tasks, we need to turn do_stuff (and potentially everything it touches) into async code. Here is a first pass at the conversion:

import asyncio

async def a_function():
    # some async-compatible action that takes a while

def another_function():
    # some sync function, but not a blocking one

async def do_stuff():
    await a_function()
    another_function()

async def main():
    tasks = []
    for _ in range(3):
        tasks.append(asyncio.create_task(do_stuff()))
    await asyncio.gather(tasks)

asyncio.run(main())

Note the changes we made to main. Now main uses asyncio to launch each instance of do_stuff as a concurrent task, then waits for the results (asyncio.gather). We also converted a_function into an async function, since we want all instances of a_function to run side by side, and alongside any other functions that need async behavior.

If we wanted to go a step further, we could also convert another_function to async:

async def another_function():
    # some sync function, but not a blocking one

async def do_stuff():
    await a_function()
    await another_function()

However, making another_function asynchronous would be overkill, since (as we’ve noted) it doesn’t do anything that would block the progress of our program. Also, if any synchronous parts of our program called another_function, we’d have to convert them to async as well, which could make our program more complicated than it needs to be.

Step 3: Test your Python async program thoroughly

Any async-converted program needs to be tested before it goes into production to ensure it works as expected.

If your program is modest in size — say, a couple of dozen lines or so — and doesn’t need a full test suite, then it shouldn’t be difficult to verify that it works as intended. That said, if you’re converting the program to async as part of a larger project, where a test suite is a standard fixture, it makes sense to write unit tests for async and sync components alike.

Both of the major test frameworks in Python now feature some kind of async support. Python’s own unittest framework includes test case objects for async functions, and pytest offers pytest-asyncio for the same ends.

Finally, when writing tests for async components, you’ll need to handle their very asynchronousness as a condition of the tests. For instance, there is no guarantee that async jobs will complete in the order they were submitted. The first one might come in last, and some might never complete at all. Any tests you design for an async function must take these possibilities into account.

How to do more with Python

Next read this:

Serdar Yegulalp is a senior writer at InfoWorld, focused on machine learning, containerization, devops, the Python ecosystem, and periodic reviews.

How to choose a low-code development platform