generated from microsoft/python-package-template
-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Labels
discussionOpen discussionsOpen discussions
Milestone
Description
I think the current naming scheme of the pp.io module is somewhat inconsistent and confusing:
df_to_graphis misleading since it does not result in a "proper" graph because the node attributes are missing. The opposite is also the case, the functiongraph_to_dfis in reality more something in the lines ofedges_to_df.- There is a
write_csvfunction but noread_csvfunction; instead, there are severalread_csv_...functions - one for each class. Insidewrite_csv. - Hardcoding the column names to
vandwseems very inflexible to me.
I propose the following changes:
- Instead of
df_to_graph, there should be aread_dataframe(edges: Optional[DataFrame], nodes: Optional[DataFrame])function that takes two optional dataframes, one for edges and one for nodes. This should then also be made consistent with all other formats, e.g.read_csv(edges: Optional[str | Path], nodes: Optional[str | Path]). Ideally, these methods should automatically attempt to infer the class, i.e. simple or temporal graph, with an additional option to set the class specifically. - Instead of
graph_to_df, we would then have a functionwrite_dataframe(...) -> DataFrame, DataFramethat returns two dataframes by default, one for the nodes and one for the edges. - Additionally, there could be functions like
edges_to_dfandnodes_to_dfwhich would split the functionality ofwrite_dataframeinto two parts. - For flexibility, all of these methods should have parameters like
source_col,target_colandtime_colthat specify the name of the column that contains the source/target node and the timestamp. PathDatacould either be handled separately or included into the above with an additional optional parameterread_csv(edges, nodes, paths).
Metadata
Metadata
Assignees
Labels
discussionOpen discussionsOpen discussions