Thanks for the awesome framework!
I have a few questions that I couldn’t find an answer for in the user guide, it would be great if someone could help me.
What is the correct way to access all columns during an (groupby + ) apply operation?
I’m aware that Modin parallelizes column-wise for an apply operation, however doesn’t this break the pandas API? To my knowledge in pandas, apply is supposed to expose all columns to the operation and is therefore suitable for operations that require more than one column. Compared to e.g. aggregate.
Additionally, for a groupby + apply, it seems to me that it would make more sense to parallelize over the groups instead of columns, this would still follow the pandas implementation compared to column parallelization.
Finally, it would also be awesome if we could select how to parallelize e.g. group,column, row, etc.
Is there a way to force Modin to use the standard pandas implementation for a certain operation? There are cases where parallelization is not desirable such as where you would need to share state between threads or where you know that the additional overhead is not worth it.
In such situations it would be nice to be able to quickly (and programmatically) switch back and forth between standard pandas and Modin.
Related to this is the ability to force single-thread execution for specific (sequence of) operations, which could also solve the problem.
Thanks for the help.