Take the following code:
import modin.pandas as pd
if isinstance(df, pd.DataFrame):
# do something
else:
# do something else
Checking whether or not an object is a Modin dataframe shouldn’t be super costly, and definitely shouldn’t create an entire multiprocessing execution context.
You can also consider the case where someone wants to use Modin in their library. Those two cases alone were enough for me to make the change.
Syntactically it is also nicer. We are going from:
import ray
ray.init() # don't mess up the order!
import modin.pandas as pd
df = pd.read_parquet("some.parquet")
to supporting
import modin.pandas as pd
import ray
ray.init()
df = pd.read_parquet("some.parquet")
Also, the engine doesn’t need to be updated before import:
import modin.pandas as pd
from modin.config import Engine
Engine.put("dask")
df = pd.read_parquet("some.parquet")
Does that make sense?