I have also just added a PR modin-project/modin#489 for running Arrow Table format natively instead of using pandas as the engine to perform the queries. There are a couple of ways that Arrow allows for operating on its Table format, but for now, I am using Gandiva to accelerate
query code contributed by @suquark on GitHub).
This PR isn’t currently in
modin.experimental.pandas though perhaps it should be. Any thoughts?
Currently, this is what appears:
$ MODIN_BACKEND=Gandiva ipython Python 3.6.6 (v3.6.6:4cf1f54eb7, Jun 26 2018, 19:50:54) Type 'copyright', 'credits' or 'license' for more information IPython 7.0.1 -- An enhanced Interactive Python. Type '?' for help. In : import modin.pandas as pd UserWarning: Gandiva on Ray is experimental, some behaviors may not match expectations. Process STDOUT and STDERR is being redirected to /tmp/ray/session_2019-03-03_16-11-38_6458/logs. Waiting for redis server at 127.0.0.1:45366 to respond... Waiting for redis server at 127.0.0.1:20080 to respond... Starting Redis shard with 10.0 GB max memory. Starting the Plasma object store with 20.0 GB memory using /tmp.
There is a warning in the first line of the output after it is imported, but it might be getting lost in the regular Ray output.