This is my first post so pardon me for any mistakes. I see that Modin can be used with either Dask or Ray as the engine. Does Modin with Dask as a compute engine will provide the same performance as compared to if I use dask.DataFrame directly.? How is using Modin with Dask different from dask.DataFrame?
You can check nice post about Dask and Modin here - modin/modin_vs_dask.md at c23b3ceb8e085b403c036bc9070b6d757b42926e · modin-project/modin · GitHub
Hope this helps
Hi @msahu, there’s another page here with more low-level information: Modin vs. Dask Dataframe — Modin 0.9.1+13.gc23b3ce documentation
The general idea is that Modin should perform at least as well as Dask, but supports significantly more operators (2.5x more).
Thanks @devin-petersohn, @Garra1980 for your quick response. The documentation explains really well the difference between Dask DataFrame and Modin. One more thing I wanted to get clarified is, does Modin will allow using larger than memory dataframes with Dask as backend?
@msahu Yes, it should automatically allow this. Please let use know if you run into issues with it!