Hi Devin, and thanks for your answer and for the work you put in modin, I find it very promising!
I’m specifically talking about a use case where the data is fetched from a database, and is in total larger than would fit in the memory of any single machine. So what I’d like to do is to split the work across multiple workers running on different machines, so that they all process a subset of the data, without running out of memory, and then combine the results (which are much, much smaller than the original data). So I think that in some way, yes, it has to do with query optimization and planning, but more under then angle of sharing and reducing memory footprint than under the usual angle of accelerating computation. Here I’m rather talking about making the computation possible at all.