I’m trying to use Modin to do a distributed merge operation on random data (10M rows, 2 columns, int). I have a 2 nodes in the cluster with 16 cores each. I’m using Ray 1.4.1. And I believe there’s enough memory in the cluster for this operation. But the operation fails with a “ray.exceptions.ObjectLostError: Object is lost due to node failure”.
I was ONLY able to merge 1M rows.
Given the resource available in the nodes, I believe the cluster should be able to do a better job than this IMO.
Please let me know what I’m missing here.
I have opened a github issue for this, and it has more information on the cluster and error logs.