Using all cores in a laptop

Hi,
I am trying to best utilize Modin to parallelize a few pandas operations. But, when I run, and do ‘htop’ on a terminal to see the CPU usage, they do not seem to be using all the 8 cores that my machine has.

Here is what I am doing: A large dataframe df (553257 rows). This is a subset of a much larger dataset.

df1 = df.groupby([‘Id’, ‘Title’]).agg({‘Text’: ’ '.join}).reset_index()

Here: https://modin.readthedocs.io/en/latest/UsingPandasonRay/dataframe_supported.html, I noticed that ‘groupby’ operation is ‘not yet optimized’. Furthermore, ‘agg’ operation is “Partially implemented”. If the docs are uptodate, then, this may be the cause?

I also tried this: import ray; ray.init(num_cpus = 8). Did not work. I have both Dask and Ray installed. I have not explicitly set my compute engine…hoping that Modin will choose one automagically.

Or, if you have any suggestions to ensure that all cores are being utilized, that would be great.

Thanks!
Sri.

Hi @srini,

df1 = df.groupby([‘Id’, ‘Title’]).agg({‘Text’: ’ '.join}).reset_index()

This line will default to pandas for the moment because of the groupby on multiple columns is not yet supported. I am planning on implementing this within the month. That is what is meant in the documentation by not yet optimized. agg should be fully implemented for your use case.

Let me know if you have any other questions!

Devin