Tracking Monster Jobs with TQDM

TQDM is a tiny Python package that lets you add customisable progress bars to your code. Ideal for some those nasty multi-hour model training jobs.

Tracking Monster Jobs with TQDM

TQDM is a fantastic, easy-to-use, extensible progress bar Python package. It makes adding simple progress bars to Python processes extremely easy. If you’re a Data Scientist or Machine Learning (ML) Engineer with some of experience, chances are you’ll no doubt have used or developed algorithms or data transformations that can take a fair while – perhaps many hours or even days – to complete.

Invariably, many Data Scientists opt to simply print status messages to console, or in some slightly more sophisticated cases use the (excellent and recommended) built-in logging module. In a lot of cases this is fine. However, if you’re running a task with many hundreds of steps (e.g. training epochs), or over a data structure with many millions of elements, these approaches are sometimes a little unclear and verbose, and frankly kind of ugly.

Show me the code!

That’s where tqdm can come in. It has a nice clean API that lets you quickly add progress bars to your code. Plus it has a lightweight ‘time-remaining’ estimation algorithm built in to the progress bar too. The tqdm package is used in a few ML packages, one of the more prominent perhaps being implicit, a Python implicit matrix factorisation library. In implicit, training jobs are tracked with tqdm as they can sometimes run for quite some time. For the purposes of this post, take a look at the example of a mocked-up training loop using tqdm, below:

import time
from tqdm import tqdm

with tqdm(total=100) as progress:
    for i in range(100):

In this simple example, you set up a tqdm progress bar that expects a process of 100 steps. Then you can run the mock training loop (with a 0.25 second pause between steps), each time updating the progress bar when the step is completed. You can also update the progress bar by arbitrary amounts if we break out of the loop too. That’s two lines of code (plus the import statement) to get a rich progress bar in your code.

A basic tqdm progress bar!

Pandas integration

Beyond cool little additions to your program’s outputs,tqdmalso integrates nicely with other widely used packages. Probably the most interesting integration for Data Scientists is with Pandas, the ubiquitous Python data analysis library. Take a look at the example below:

df = pd.read_csv("weather.csv")
tqdm.pandas(desc="Applying Transformation")
df.progress_apply(lambda x: x)

Technically, the tqdm.pandas method monkey patches the progress_apply method onto Pandas data structures, giving them a modified version of the commonly used apply method. Practically, when we call the progress_apply method, the package wraps the standard Pandas apply method with a tqdm progress bar. This can come in really handy when you’re processing large data frames!

An example of a progress bar generated from tqdm's integration with pandas

Parallel processes

There's one other common application that's worth mentioning here too: tqdm is great for setting up progress bars for parallel processes too. Here is an example using some of tqdm's built in support for updating a progress bar for a parallel map:

import time
from tqdm.contrib.concurrent import process_map

def my_process(_):

r = process_map(my_process, range(0, 100), max_workers=2, desc="MyProcess")

An example of using tqdm's built-in support for parallel process support.

In this case, you'll have a single progress bar that gets updated each time a my_process call finishes. There's a second use case though: how about if you've got a few long-running processes and you want to track these individually? This might be preferable if you want to avoid serialising and de-serialising large objects into and out of processes, for example. You can do that too:

import time
import multiprocessing as mp
from tqdm import tqdm

def my_process(pos):
    _process = mp.current_process()
    with tqdm(desc=f"Process {pos}", total=100, position=pos) as progress:
        for _ in range(100):

n_cpu = mp.cpu_count(
with mp.Pool(processes=n_cpu, initializer=tqdm.set_lock, initargs=(tqdm.get_lock(),)) as pool:, range(n_cpu))

A somewhat more sophisticated example with multiple progress bars.

This should give you an output something along the lines of:

An example of progress bars tracking 8 jobs simultaneously.

There's a Gist of this example you can use too.