Profile modin line by line

Introduction

We profile modin line-by-line to determine the choke points of functions and determine where we focus our attention. To do so, we use line_profiler. Though the library is easy to use, it has to be built from source and cannot be installed directly from PyPI. Here we go through the steps to install line_profiler and how to use it for modin development.

Installation

We begin by using a virtual environment with modin installed as described here. Building from source will require Cython so we install it using

pip install -U cython

Then, we run the following commands from the README

git clone https://github.com/rkern/line_profiler.git
find line_profiler -name '*.pyx' -exec cython {} \;
cd line_profiler
pip install . --user

Usage

To profile a function, we have to (1) write a script that calls the function and (2) add the decorator @profile to the function we want to profile. As an example, we will profile modin/pandas/dataframe.py::sum(). To do so, we first create a file called profile_script.py consisting of the following

import modin.pandas as pd

df = pd.DataFrame({"col1": [1,2,3], "col2": [4,5,6]})
df.sum()

Then, we add in the decorator to sum() as follows in modin/pandas/dataframe.py

    @profile
    def sum(
        self,
        axis=None,
        skipna=None,
        level=None,
        numeric_only=None,
        min_count=0,
        **kwargs
    ):
        axis = self._get_axis_number(axis)
        new_index = self.columns if axis else self.index
        if min_count > len(new_index):
            return Series(
                [np.nan] * len(new_index), index=new_index, dtype=np.dtype("object")
            )
        return super(DataFrame, self).sum(
            axis=axis,
            skipna=skipna,
            level=level,
            numeric_only=numeric_only,
            min_count=min_count,
            **kwargs
        )

Then, we run

kernprof -lv profile_script.py

which will immediately display the profile results in terminal. If we want to save the profile results to view later, we can run

kernprof -l profile_script.py

to generate the profile information at profile_script.py.lprof. To view the profile results, we run

python -m line_profiler profile_script.py.lprof
1 Like