I tried to use modin to parallelize pandas operations in a plotly/dash environment.
However when creating data frames with modin, they look the same but seem to lack the information about column names:
import modin.pandas as pd
# import pandas as pd
import numpy as np
from dash import Dash, html, dcc, callback
from dash.dependencies import Input, Output
import dash_bootstrap_components as dbc
import plotly.express as px
import ray
ray.init(runtime_env={'env_vars': {'__MODIN_AUTOIMPORT_PANDAS__': '1'}})
# tried with df from numpy ...
rand_num_mat = np.random.rand(10, 2)
df = pd.DataFrame(rand_num_mat, columns=['x', 'y'])
# or from csv
# df = pd.read_csv('random.csv')
# columns names 'x' and 'y' are there in both versions
print(df.columns)
app = Dash()
app.layout = html.Div([
dcc.Graph(id='figure'),
dbc.Button(id='btn-populate', children='Populate my graph!')
])
@callback(
Output('figure', 'figure'),
Input('btn-populate', 'n_clicks'),
config_prevent_initial_callbacks=True
)
def populate_test_modin_df(n):
if n:
# works with pandas and modin
# return px.scatter(df, x=0, y=1)
# works with pandas but not with modin
return px.scatter(df, x='x', y='y')
if __name__ == '__main__':
app.run_server(debug=True)
Hi @luggie, unfortunately, modin.pandas is known to be incompatible with plotly. I think plotly would need some changes to work with Modin. I recommend converting your dataframe to pandas with _to_pandas() and then trying to plot that.
I believe the kind of error you’re hitting happens because plotly has code like
df = pandas.DataFrame(result)
where result is a modin dataframe. That happens to sort of work in pandas, but it resets the column names to 0, 1, …
Let’s please follow up on the existing plotly issues in Modin:
Hi @mahesh ,
Thanks for you and the links and help!
It appears to me that this would be a problem of modin, as pandas.DataFrames do not drop their column names when fed into plotly_express.scatter() and modin is a (for sure not complete) API for pandas right? That’s means the pandas.DataFrame() function is not yet fully supported by modin.
Well at the end of the day, it does not really matter who’s job it is, since it is known for some time now, and neither side sees it as important enough to work on it unfortunately.
I really wish modin worked with plotly and other graphing libraries already, but if a library other than pandas takes a dataframe df as input and calls pandas.DataFrame(df), Modin can’t guarantee on its own that the resulting pandas dataframe will be valid in any way. Modin has no control over what the pandas constructor does with its inputs. Modin aims to provide the same semantics as the pandas API, but does not guarantee that a modin.pandas object can substitute the equivalent pandas object anywhere. To take another example, regular pandas.merge(df1, df2) assumes that both dataframes are pandas dataframes, and it will access internal fields of both dataframes. It would be too painful for pandas maintainers to make pandas accept modin dataframes in all such methods. Likewise, modin.pandas.merge should not have to consider whether its inputs are pandas or modin dataframes. Modin dataframes do not subclass pandas dataframes and are not pandas dataframes, so pandas.DataFrame(pandas_df) will work but pandas.DataFrame(modin_df) may not.