Modin on Databricks

Hi,
I was trying to install Modin on a Spark cluster using Azure Databricks using PyPI. It fails with the modin[all] syntax after which I tried installing Ray and Dask first and then just Modin.

All three libraries got installed on the cluster individually without the syntax modin[…] but I can’t use them in a notebook which again errors out asking me to install using the modin[ray] or modin[dask] syntax.

Please let me know if there is any other alternative to get this going.

Regards,
Nachiketa

Hi @nachiketa,

It is strange that pip install modin[all] fails, I would not expect that. Do you recall the error message you saw with the failure?

Did you try installing with pip install modin[ray] and pip install modin[dask]? Dask should work with the pip version, but we depend on a specific version of Ray at the moment. You can also install the specific ray version for modin with pip install ray==0.7.3.

Also, you should note that the default behavior of Modin is to run in local mode. Cluster mode is currently only preliminarily supported on EC2 with the autoscaler functionality from Ray. More can be found here: https://modin.readthedocs.io/en/latest/using_modin.html#using-modin-on-a-cluster-experimental

Let me know if this helps or if you have any other questions!