Is SQL on Modin still In Memory only?

SQL On Modin/Ray sounds exciting! (SQL on Modin Dataframes — Modin 0.10.1+14.gb58663b.dirty documentation)

The link mentions that it is only on data that can be held in memory. But with Ray 1.3+, I believe that out-of-core data computation is supported by default.

Does that mean that SQL on Modin can be applied on data that’s larger than available memory?

Thanks,
Rajiv

@RAbraham Great question! The SQL API for Modin is still in early alpha, and we’re constantly trying to improve it.

Currently, it will support out of core computation, but not in the same way a traditional database does. If you read from a SQL database, Modin will still pull that data into memory to compute on it as a dataframe. We have ideas on more sophisticated ways of handling it, but haven’t been able to implement them yet.

Please do let me know if you try it and how it goes. Any feedback would be super helpful!

Hi @devin-petersohn
Just to see if I understand. Let’s say I pull data from a SQL database, Modin will pull into memory. If the data is too much to hold in memory, it will spill it to disk… and then Ray will seamlessly pull the right amount of data into memory to do the computation without an OOM?

@RAbraham that is what I expect. It is not tested yet, we are still building out the SQL support and haven’t been able to test such cases.

1 Like

Thanks @devin-petersohn . I’ll circle back later.