Will Modin read in parallel and concatenate the data from a set of csv files defined by a glob, with identical columns, into a single modin dataframe?
I’m running through this tutorial that uses SNP data: https://arvkevi.github.io/Out_of_Core_Genomics.html
and want to convert it from Dask to Modin.
In pure Pandas you would read the files serially, joining each time, or read them in first and then concat
df = pd.concat([pd.read_csv(f) for f in glob.glob(‘data*.csv’)], ignore_index = True)
Dask supports reading directly a glob of csv files.
I’d like for Modin to be reading in parallel, though I’m not sure if that could be expressed with only the Pandas API …