First time poster and new to modin!
I’m using modin to interrogate a large graph in parallel.
I seem to be running out of memory and am wondering how the graph is being stored in memory.
my use case looks like so:
def main(): def traverse_graph(row, graph): u, v = row.u, row.v shortest_path = networkx.shortest_paths(graph, u, v) # do something with shortest path return [node for node in shortest_path if some_filtering logic] # create some huge graph 40gb+ graph = networkx.DiGraph() # load the graphing candidates from csv candidate_df = pd.from_csv('some_csv_file.csv') candidate_df["shortest_path_and_extras"] = candidate_df.apply(lambda row: traverse_graph(row, graph), axis=1)
I’m concered the graph is being passed to each worker and am wondering the best way to avoid it.
graph_memory_reference = ray.put(graph) in conjuction with
ray.get(graph_memory_reference) but the issue seems to persist