TQDM is a fantastic, easy-to-use, extensible progress bar Python package. It makes adding simple progress bars to Python processes extremely easy. If you’re a Data Scientist or Machine Learning (ML) Engineer with some of experience, chances are you’ll no doubt have used or developed algorithms or data transformations that can take a fair while – perhaps many hours or even days – to complete.
Invariably, many Data Scientists opt to simply print status messages to console, or in some slightly more sophisticated cases use the (excellent and recommended) built-in
logging module. In a lot of cases this is fine. However, if you’re running a task with many hundreds of steps (e.g. training epochs), or over a data structure with many millions of elements, these approaches are sometimes a little unclear and verbose, and frankly kind of ugly.
Show me the code!
tqdm can come in. It has a nice clean API that lets you quickly add progress bars to your code. Plus it has a lightweight ‘time-remaining’ estimation algorithm built in to the progress bar too. The
tqdm package is used in a few ML packages, one of the more prominent perhaps being
implicit, a Python implicit matrix factorisation library. In
implicit, training jobs are tracked with
tqdm as they can sometimes run for quite some time. For the purposes of this post, take a look at the example of a mocked-up training loop using
import time from tqdm import tqdm with tqdm(total=100) as progress: for i in range(100): time.sleep(0.25) progress.update(1)
In this simple example, you set up a
tqdm progress bar that expects a process of 100 steps. Then you can run the mock training loop (with a 0.25 second pause between steps), each time updating the progress bar when the step is completed. You can also update the progress bar by arbitrary amounts if we break out of the loop too. That’s two lines of code (plus the import statement) to get a rich progress bar in your code.
Beyond cool little additions to your program’s outputs,
tqdmalso integrates nicely with other widely used packages. Probably the most interesting integration for Data Scientists is with Pandas, the ubiquitous Python data analysis library. Take a look at the example below:
df = pd.read_csv("weather.csv") tqdm.pandas(desc="Applying Transformation") df.progress_apply(lambda x: x)
tqdm.pandas method monkey patches the
progress_apply method onto Pandas data structures, giving them a modified version of the commonly used
apply method. Practically, when we call the
progress_apply method, the package wraps the standard Pandas
apply method with a
tqdm progress bar. This can come in really handy when you’re processing large data frames!
There's one other common application that's worth mentioning here too:
tqdm is great for setting up progress bars for parallel processes too. Here is an example using some of
tqdm's built in support for updating a progress bar for a parallel map:
In this case, you'll have a single progress bar that gets updated each time a
my_process call finishes. There's a second use case though: how about if you've got a few long-running processes and you want to track these individually? This might be preferable if you want to avoid serialising and de-serialising large objects into and out of processes, for example. You can do that too:
This should give you an output something along the lines of:
There's a Gist of this example you can use too.