Read multiple CSV files

Hi,
I have a bunch of CSV files (as a result of an export of a BigQuery table) and I am trying to use modin to work with them in memory.

Prior to this, I have used dask to read the multilpe csv files, which is done easily. If we suppose that the csv files are: data-001.csv, data-002.csv, …, we can load them all using:

import dask.dataframe as dd
ddf = dd.read_csv(
    'data-*.csv'
)
df = ddf.compute()

This loads the data as a pandas DataFrame. Is there a way to load the data using modin library? And is it possible to have a modin.pandas.DataFrame instead of pandas.DataFrame?

Thanks !

Hi @lucasrodes, thanks for the question!

At the moment, there isn’t a syntax for doing this in the modin.pandas API. This is because the pandas API itself does not support this style of access. As it stands, you would need to loop over the files and pd.concat them all together.

We would definitely consider an extension to the API that supports this style of access. Do you want to request the feature on the GitHub? https://github.com/modin-project/modin/issues/new/choose

Thanks for your response, @devin-petersohn, I understand! Will be posting an issue then.