If you spend a lot of time waiting around for your code to run, and you've tried all the easier things, you can sometimes get a big speedup by parallelizing it — breaking it into chunks and running each chunk separately. Laptops typically typically have at least two cores, and many have four. Desktop boxes can have as many as 32. But Python is single-threaded. By default it will only run on one core at a time. The multiprocessing package helps us to use as many cores as we want.

Here's a minimal example that you can copy and paste to get started with.

from multiprocessing import Pool
import os
import numpy as np

def f(n):
    return np.var(np.random.sample((n, n)))

result_objs = []
n = 1000
with Pool(processes=os.cpu_count() - 1) as pool:
    for _ in range(n):
        result = pool.apply_async(f, (n,))
        result_objs.append(result)
    
    results = [result.get() for result in result_objs]
    print(len(results), np.mean(results), np.var(results))
( Here's the snippet on GitLab.)

The multiprocessing package is incredibly powerful. This is only the tiniest part of its capabilities. But hopefully it's a part that you'll find particularly useful.