Hey,
First time poster and new to modin!
I’m using modin to interrogate a large graph in parallel.
I seem to be running out of memory and am wondering how the graph is being stored in memory.
my use case looks like so:
def main():
def traverse_graph(row, graph):
u, v = row.u, row.v
shortest_path = networkx.shortest_paths(graph, u, v)
# do something with shortest path
return [node for node in shortest_path if some_filtering logic]
# create some huge graph 40gb+
graph = networkx.DiGraph()
# load the graphing candidates from csv
candidate_df = pd.from_csv('some_csv_file.csv')
candidate_df["shortest_path_and_extras"] = candidate_df.apply(lambda row: traverse_graph(row, graph), axis=1)
I’m concered the graph is being passed to each worker and am wondering the best way to avoid it.
I tried:
graph_memory_reference = ray.put(graph)
in conjuction with ray.get(graph_memory_reference)
but the issue seems to persist