API Inspiration (Arquero)

First of all, I just want to say that I’ve been following Modin for awhile now and have been super impressed with the vision and the progress.

I’ve noticed “Modin API” on the roadmap and have been excited about what that will bring. I’ve long been a fan of dplyr in R and have always wished for such an API in Python. There have been no shortage of attempts to create a Python clone, but I think that they’ve always tried to stay too true to the R implementation instead of embracing the advantages and disadvantages between the two languages.

A month or so ago, Jeffrey Heer released Arquero for JavaScript. I suspect that you may already be familiar with Jeffrey, but he has a history with data analysis APIs (Trifacta, datalib, dataflow-api). JavaScript is much closer to Python in syntax and I think that the API ideas of Arquero could make for a solid basis of a Python design. I’m really impressed with the balance that he’s come up with.

There are some other really interesting APIs that I’ve seen around the Python ecosystem that I’d be happy to chat about, but I thought that I’d bring this one to your attention since it may not be as likely to cross your radar.

1 Like

Thanks @jcmkk3, I wasn’t aware of Arquero. I will definitely play around with it.

I think one of the biggest issues of the dplyr clones in Python is the lack of backwards compatibility with existing code. There’s a huge amount of “legacy” code that would have to be supported, and it’s hard for institutions to sign on to supporting that legacy and a new tool.

That’s a major part of why Modin is architected in such a modular way: we want to support both “legacy” pandas alongside a more compact and easy to understand API running on the same execution. There are a lot of competing projects in the data space, and it’s really hard for users to gauge what will be useful to them (partly because of questionable marketing techniques). Modin is arguably the easiest project to move to and from for your existing workloads, and I think the moving “from” is equally as important as the moving “to”. Lock-in is a huge problem in modern systems and we also want to remove that barrier, it should be easy to use what you want at any given time!

Thanks again for sharing this!