Show/Hide Columns as a Toolbar function

I’m very excited to have found Modin yesterday. The OS-Climate project is building a data browser, and Modin matches closely the requested spec for the browser. Yay for open source!

Our data source (corporate financial reports) can be very wide, though it can also be made tidy. When tidy, the observations can be quite simple:

Company X, Reported Year Y, that they emitted Z megatons of CO2

However, users want to browse that data in context, such as an ESG (Environmental-Social-Governance) report that gives:

Company X, in Sector S, Subsector S.1, Subsubsector S.2, Subsubsector S3, Industry Code S.4, in Country C, in Region R, reported an ESG topic Factor Area FA, Factor F, subfactor SF, reported Year Y, that they emitted Z megatons of CO2

In other words, sometimes analysts are interested in browsing down industry taxonomies before looking at what ESG factors are reported, whereas other times they are interested in browsing down ESG factors, and then want to see which sectors/companies are reporting those factors.

What I’d ilke to do is to name column groups and put those in the toolbar as columns that can be shown/hidden in the view. Any guidances or gotchas before I start working on spreadsheet.widget.js and

I will note for the record that there is an unfortunate class in naming: change_selection in operates as users select or de-select rows with the GUI. However, in the tidy data world, “filter” is what selects rows and “select” is what selects columns. My inclination is to change “selection” to “highlighted” or “row_selection” or some such.

@MichaelTiemannOSC Thanks for posting and welcome to the Modin Discourse!

The spreadsheet UI is still very early. I am glad you have found it and welcome any and all feedback.

Show/hide would be a really nice functionality. It should probably only be a UI function, and not necessarily needed in the reproducibility cell. A couple of questions here:
1.) Are the column groups persisted as state in the widget, or are they a part of the dataframe?
2.) Would the column groups be highlightable? What other functions besides show/hide would be used?

You are correct that the “selection” would be better named “highlighted”. Are you interested in opening a PR for that change?

Let me know if you have any other questions/feedback!

Thanks for that response!

I see the columns/groups persistent to the widget, not the Dataframe. In the kludge I put together (because I don’t yet have a good Jupyter/GitHub build environment set up yet) I did my best to keep the dataframe as whole as possible while letting the widget do the restrictions of what was visible.

It nice to have UI consistency. I don’t have a use case for exporting highlighted columns (scary to think about with million-row tables), but for relatively small tables might be nice to pick a handful of columns and export. But for the use-case I envision, it’s all about getting the overall widget to a state I want, and then exporting from there.

I’ll open a PR when I sort out how I’m actually going to build the branch I forked. My native development environment is not ideal, and I don’t have a properly defined Notebook/GitHub container that’s great. If you could share a Docker compose file that’s good for that, I’d appreciate it!

We have an internal document (which might not be completely up to date) to set up development environment: Setting up Modin Spreadsheet Development

Let me know if that helps!

The problem I’m trying to solve is needing to build a fresh container every time I want to test a source-level change. If I could do the development inside the container then I can iterate in a consistent environment and push changes out to GitHub from there. I’d only need to make a fresh container when things get really interesting.

How are you installing from source the in the container? This:

pip install -e .

(run from the project root) will allow you to install in “editable” mode and make Python changes without needing to reinstall anything. It should just need a restart of the jupyter kernel inside the notebook. On import the updated source will be loaded. Are you using docker in interactive mode?

I didn’t realize that was such a powerful command. Will see what I can do with it.

@MichaelTiemannOSC Sorry for the late followup, I created a Dockerfile that you can use to run modin-spreadsheet on Docker: modin-spreadsheet/Dockerfile at e758f71c72ad4b0bbcd66f6e9dff86b48e2ce63b · richardlin047/modin-spreadsheet · GitHub.

You can either clone that branch or copy the file. You should be able to get to a jupyter notebook using these steps:

  1. docker build -t modin-spreadsheet --progress=plain .
  2. docker run -p 8888:8888 modin-spreadsheet
  3. Click on the jupyter notebook link that pops up with a domain.
  4. Navigate to the jupyter notebook you want to use. The Dockerfile needs the source code, so you can include your jupyter notebook in the “modin-spreadsheet” directory if you want to use your own.

I’m not too familiar with auto-reloading, but you could rebuild the image if you make any changes to the source code.

For the most part, using pip install -e . without docker will automatically rebuild any python changes. Any javascript changes can be manually rebuilt using npm install in the js folder. More details for rebuilding are in the README.

Let me know if you have any questions!

Thanks for the answer!