Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions rust/src/graph/cycles.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
use crate::errors::GrimpResult;
use crate::graph::pathfinding;
use crate::graph::{ExtendWithDescendants, Graph, ModuleToken};
use rustc_hash::{FxHashMap, FxHashSet};
use tap::Conv;

impl Graph {
pub fn find_shortest_cycle(
&self,
module: ModuleToken,
as_package: bool,
) -> GrimpResult<Option<Vec<ModuleToken>>> {
if as_package {
pathfinding::find_shortest_cycle(
self,
&module.conv::<FxHashSet<_>>().with_descendants(self),
&FxHashSet::default(),
&FxHashMap::default(),
)
} else {
pathfinding::find_shortest_cycle(
self,
&module.conv::<FxHashSet<_>>(),
&FxHashSet::default(),
&FxHashMap::default(),
)
}
}
}
4 changes: 2 additions & 2 deletions rust/src/graph/import_chain_queries.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
use crate::errors::GrimpResult;
use crate::graph::pathfinding::{find_reach, find_shortest_path_bidirectional};
use crate::graph::pathfinding::{find_reach, find_shortest_path};
use crate::graph::{ExtendWithDescendants, Graph, ModuleToken};
use itertools::Itertools;
use rustc_hash::{FxHashMap, FxHashSet};
Expand Down Expand Up @@ -62,7 +62,7 @@ impl Graph {
excluded_modules: &FxHashSet<ModuleToken>,
excluded_imports: &FxHashMap<ModuleToken, FxHashSet<ModuleToken>>,
) -> GrimpResult<Option<Vec<ModuleToken>>> {
find_shortest_path_bidirectional(
find_shortest_path(
self,
from_modules,
to_modules,
Expand Down
1 change: 1 addition & 0 deletions rust/src/graph/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ use std::sync::RwLock;
use string_interner::backend::StringBackend;
use string_interner::{DefaultSymbol, StringInterner};

pub mod cycles;
pub mod direct_import_queries;
pub mod graph_manipulation;
pub mod hierarchy_queries;
Expand Down
139 changes: 138 additions & 1 deletion rust/src/graph/pathfinding.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
use crate::errors::{GrimpError, GrimpResult};
use crate::graph::{Graph, ModuleToken, EMPTY_MODULE_TOKENS};
use indexmap::{IndexMap, IndexSet};
use itertools::Itertools;
use rustc_hash::{FxHashMap, FxHashSet, FxHasher};
use slotmap::SecondaryMap;
use std::hash::BuildHasherDefault;
Expand Down Expand Up @@ -28,7 +29,8 @@ pub fn find_reach(
&seen.into_iter().collect::<FxHashSet<_>>() - from_modules
}

pub fn find_shortest_path_bidirectional(
/// Finds the shortest path, via a bidirectional BFS.
pub fn find_shortest_path(
graph: &Graph,
from_modules: &FxHashSet<ModuleToken>,
to_modules: &FxHashSet<ModuleToken>,
Expand All @@ -39,6 +41,39 @@ pub fn find_shortest_path_bidirectional(
return Err(GrimpError::SharedDescendants);
}

_find_shortest_path(
graph,
from_modules,
to_modules,
excluded_modules,
excluded_imports,
)
}

/// Finds the shortest cycle from `modules` to `modules`, via a bidirectional BFS.
pub fn find_shortest_cycle(
graph: &Graph,
modules: &FxHashSet<ModuleToken>,
excluded_modules: &FxHashSet<ModuleToken>,
excluded_imports: &FxHashMap<ModuleToken, FxHashSet<ModuleToken>>,
) -> GrimpResult<Option<Vec<ModuleToken>>> {
// Exclude imports internal to `modules`
let mut excluded_imports = excluded_imports.clone();
for (m1, m2) in modules.iter().tuple_combinations() {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, I dont know Rust at all, I will try to understand the full code and learn a bit of Rust. Maybe then I will add some useful comments to the PR.

According to those lines (question about logic, not Rust details): what is the logic in lines 62-64 for? Why do we remove internal imports between modules in modules?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably the way to understand this is by looking at the test case test_ignores_internal_imports_when_as_package_is_true in test_cycles.py. If we remove L62-65 then that test case will start failing.

AssertionError: assert ('colors.blue',) == ('colors.red', 'x', 'y', 'z', 'colors.blue')

There is an import between colors.red and colors.blue (and the reverse import too). We don't want to follow either of those imports for the BFS, else we will short circuit and never end up leaving the colors package to find the → x → y → z → path.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that make sense?

Copy link

@K4liber K4liber Mar 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, makes sense. I mean, I understand now why we have those lines. Thanks for the explanation. But should we have them?

In the test case test_ignores_internal_imports_when_as_package_is_true, I think the shortest cycle could be ("colors.red", "colors.blue"). I think we could consider all child packages of "colors". Why do I think that?

I rather think of package "colors" as a set of all descendant modules inside all sub-packages., not only children.

In case of finding cycles in the whole project, instead of iterating over every single package in some package tree, I would rather just provide a single package. For example, I would like to fix "colors.red". If grimp does not return any cycle for "colors.red", it means that the whole package and all sub-packages inside are fine when it comes to importing.

Maybe we could introduce another parameter called something like "consider_sub_packages: bool = True"? Based on the value we could execute lines 62-64 or not. What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the function is behaving as I intended it (we're finding the shortest chain from the colors package to itself, but that chain needs to leave the colors package and not just be entirely internal to it).

This behaviour is consistent with e.g. find_shortest_chain. E.g.

p1.m1 → p2.m1
        p2.m2 → p3.m1

In the above example find_shortest_chain("p1", "p3", as_packages=True) will return None, since there is no direct import chain between p1 and p3 (the import p2.m1 → p2.m2 is missing). p1 and p3 are treated as entire packages, but p2 is not.

That said, judging by your reply above and #194 (comment) it sounds like find_shortest_cycle is maybe not behaving in the way you needed for your contract. I think I'm a bit confused about the contract you're trying to express. Perhaps we need to write down more clearly the behaviour you're trying to achieve?

Copy link
Collaborator Author

@Peter554 Peter554 Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about continuing discussion on a high level functionality here (@seddonym already proposed a solid documentation): #189 (comment) and then go back for the low level implementation in this PR after we agree to the scope of high level functionality?

Yes sounds good to me.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coming back to the topic, I guess in the high lvl functionality we can loop over sub-packages of the package in our interest executing find_shortest_cycle. And it will return the internal cycles if any.

For example, having the following dependencies:

graph.add_module("colors")
graph.add_import(importer="colors.red", imported="colors.blue")
graph.add_import(importer="colors.blue", imported="colors.red")
graph.add_import(importer="colors.red", imported="x")
graph.add_import(importer="x", imported="y")
# graph.add_import(importer="y", imported="z")
graph.add_import(importer="z", imported="colors.blue")

we would like to find cycles (including internal ones) for "colors". So we execute:

graph.find_shortest_cycle("colors", as_package=True)
>>> None

We go further and start with a first sub-package of "colors" which is "colors.blue":

graph.find_shortest_cycle("colors.blue", as_package=True)
>>> ('colors.blue', 'colors.red', 'colors.blue')

Wanting to find all cycles, we execute further, selecting the second sub-package "colors.red":

graph.find_shortest_cycle("colors.red", as_package=True)
('colors.red', 'colors.blue', 'colors.red')

We get the semantically equivalent cycle.

I wonder if optionally disabling those lines by having a parameter consider_sub_packages or consider_internal_packages could save some computation time. It could potentially reduce the overlapping calculations on a frequently executed type of request from the high-lvl functionality.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better if the entirety of find_cycles_between_siblings is implemented at the Rust layer, in the same way as find_illegal_dependencies_for_layers. Does anyone disagree?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Thinner the Python layer is, faster executions we get. As @Peter554 mentioned in the PR description we can have both, low and high lvl functionality implemented in Grimp. Then we can utilize find_shortest_cycle in find_cycles_between_siblings, making sure we do not perform overlapping computations.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed here: #189 (comment)
we can utilize the find_shortest_cycle method, as it is implemented now, in the high level method that will be the base of the contract.

excluded_imports.entry(*m1).or_default().insert(*m2);
excluded_imports.entry(*m2).or_default().insert(*m1);
}

_find_shortest_path(graph, modules, modules, excluded_modules, &excluded_imports)
}

fn _find_shortest_path(
graph: &Graph,
from_modules: &FxHashSet<ModuleToken>,
to_modules: &FxHashSet<ModuleToken>,
excluded_modules: &FxHashSet<ModuleToken>,
excluded_imports: &FxHashMap<ModuleToken, FxHashSet<ModuleToken>>,
) -> GrimpResult<Option<Vec<ModuleToken>>> {
let mut predecessors: FxIndexMap<ModuleToken, Option<ModuleToken>> = from_modules
.clone()
.into_iter()
Expand Down Expand Up @@ -123,3 +158,105 @@ fn import_is_excluded(
.contains(to_module)
}
}

#[cfg(test)]
mod test_find_shortest_cycle {
use super::*;

#[test]
fn test_finds_cycle_single_module() -> GrimpResult<()> {
let mut graph = Graph::default();
let foo = graph.get_or_add_module("foo").token;
let bar = graph.get_or_add_module("bar").token;
let baz = graph.get_or_add_module("baz").token;
let x = graph.get_or_add_module("x").token;
let y = graph.get_or_add_module("y").token;
let z = graph.get_or_add_module("z").token;
// Shortest cycle
graph.add_import(foo, bar);
graph.add_import(bar, baz);
graph.add_import(baz, foo);
// Longer cycle
graph.add_import(foo, x);
graph.add_import(x, y);
graph.add_import(y, z);
graph.add_import(z, foo);

let path = find_shortest_cycle(
&graph,
&foo.into(),
&FxHashSet::default(),
&FxHashMap::default(),
)?;
assert_eq!(path, Some(vec![foo, bar, baz, foo]));

graph.remove_import(baz, foo);

let path = find_shortest_cycle(
&graph,
&foo.into(),
&FxHashSet::default(),
&FxHashMap::default(),
)?;
assert_eq!(path, Some(vec![foo, x, y, z, foo]));

Ok(())
}

#[test]
fn test_returns_none_if_no_cycle() -> GrimpResult<()> {
let mut graph = Graph::default();
let foo = graph.get_or_add_module("foo").token;
let bar = graph.get_or_add_module("bar").token;
let baz = graph.get_or_add_module("baz").token;
graph.add_import(foo, bar);
graph.add_import(bar, baz);

let path = find_shortest_cycle(
&graph,
&foo.into(),
&FxHashSet::default(),
&FxHashMap::default(),
)?;

assert_eq!(path, None);

Ok(())
}

#[test]
fn test_finds_cycle_multiple_module() -> GrimpResult<()> {
let mut graph = Graph::default();

graph.get_or_add_module("colors");
let red = graph.get_or_add_module("colors.red").token;
let blue = graph.get_or_add_module("colors.blue").token;
let a = graph.get_or_add_module("a").token;
let b = graph.get_or_add_module("b").token;
let c = graph.get_or_add_module("c").token;
let d = graph.get_or_add_module("d").token;

// The computation should not be confused by these two imports internal to `modules`.
graph.add_import(red, blue);
graph.add_import(blue, red);
// This is the part we expect to find.
graph.add_import(red, a);
graph.add_import(a, b);
graph.add_import(b, blue);
// A longer path.
graph.add_import(a, c);
graph.add_import(c, d);
graph.add_import(d, b);

let path = find_shortest_cycle(
&graph,
&FxHashSet::from_iter([red, blue]),
&FxHashSet::default(),
&FxHashMap::default(),
)?;

assert_eq!(path, Some(vec![red, a, b, blue]));

Ok(())
}
}
19 changes: 19 additions & 0 deletions rust/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -353,6 +353,25 @@ impl GraphWrapper {
PySet::new(py, chains)
}

#[pyo3(signature = (module, as_package=false))]
pub fn find_shortest_cycle(
&self,
module: &str,
as_package: bool,
) -> PyResult<Option<Vec<String>>> {
let module = self.get_visible_module_by_name(module)?.token();
Ok(self
._graph
.find_shortest_cycle(module, as_package)?
.map(|chain| {
chain
.iter()
.into_module_iterator(&self._graph)
.names()
.collect()
}))
}

#[pyo3(signature = (layers, containers))]
pub fn find_illegal_dependencies_for_layers<'py>(
&self,
Expand Down
9 changes: 9 additions & 0 deletions src/grimp/adaptors/graph.py
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,15 @@ def find_shortest_chains(
def chain_exists(self, importer: str, imported: str, as_packages: bool = False) -> bool:
return self._rustgraph.chain_exists(importer, imported, as_packages)

def find_shortest_cycle(
self, module: str, as_package: bool = False
) -> Optional[Tuple[str, ...]]:
if not self._rustgraph.contains_module(module):
raise ValueError(f"Module {module} is not present in the graph.")

cycle = self._rustgraph.find_shortest_cycle(module, as_package)
return tuple(cycle) if cycle else None

def find_illegal_dependencies_for_layers(
self,
layers: Sequence[Layer | str | set[str]],
Expand Down
16 changes: 16 additions & 0 deletions src/grimp/application/ports/graph.py
Original file line number Diff line number Diff line change
Expand Up @@ -282,6 +282,22 @@ def chain_exists(self, importer: str, imported: str, as_packages: bool = False)
"""
raise NotImplementedError

# Cycles
# ------

@abc.abstractmethod
def find_shortest_cycle(
self, module: str, as_package: bool = False
) -> Optional[Tuple[str, ...]]:
"""
Returns the shortest import cycle from `module` to itself, or `None` if no cycle exist.

Optional args:
as_package: Whether or not to treat the supplied module as an individual module,
or as an entire subpackage (including any descendants).
"""
raise NotImplementedError

# High level analysis
# -------------------

Expand Down
55 changes: 55 additions & 0 deletions tests/unit/adaptors/graph/test_cycles.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
import pytest

from grimp.adaptors.graph import ImportGraph


class TestFindShortestCycle:
@pytest.mark.parametrize(
"expected_cycle",
[
("foo", "bar", "foo"),
("foo", "bar", "baz", "foo"),
],
)
def test_finds_shortest_cycle_when_exists(self, expected_cycle):
graph = ImportGraph()
# Shortest cycle
for importer, imported in zip(expected_cycle[:-1], expected_cycle[1:]):
graph.add_import(importer=importer, imported=imported)
# Longer cycle
graph.add_import(importer="foo", imported="x")
graph.add_import(importer="x", imported="y")
graph.add_import(importer="y", imported="z")
graph.add_import(importer="z", imported="foo")

assert graph.find_shortest_cycle("foo") == expected_cycle

graph.remove_import(importer=expected_cycle[-2], imported=expected_cycle[-1])

assert graph.find_shortest_cycle("foo") == ("foo", "x", "y", "z", "foo")

def test_returns_none_if_no_cycle_exists(self):
graph = ImportGraph()
graph.add_import(importer="foo", imported="bar")
graph.add_import(importer="bar", imported="baz")
# graph.add_import(importer="baz", imported="foo") # This import is missing -> No cycle.

assert graph.find_shortest_cycle("foo") is None

def test_ignores_internal_imports_when_as_package_is_true(self):
graph = ImportGraph()
graph.add_module("colors")
graph.add_import(importer="colors.red", imported="colors.blue")
graph.add_import(importer="colors.blue", imported="colors.red")
graph.add_import(importer="colors.red", imported="x")
graph.add_import(importer="x", imported="y")
graph.add_import(importer="y", imported="z")
graph.add_import(importer="z", imported="colors.blue")

assert graph.find_shortest_cycle("colors", as_package=True) == (
"colors.red",
"x",
"y",
"z",
"colors.blue",
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried the following test:

    def test_cycle_root_independent(self):
        graph = ImportGraph()
        graph.add_module("package_1")
        graph.add_module("package_2")
        graph.add_module("package_3")
        graph.add_import(importer="package_2.module_b", imported="package_1.package_3.module_c")
        graph.add_import(importer="package_1.module_a", imported="package_2.module_b")
        assert graph.find_shortest_cycle("package_1", as_package=True) == (
            'package_1.module_a',
            'package_2.module_b',
            'package_1.package_3.module_c'
        )
        # should the function return the same for package_2 or not?
        assert graph.find_shortest_cycle("package_2", as_package=True) == (
            'package_1.module_a',
            'package_2.module_b',
            'package_1.package_3.module_c'
        )

Why do we only find the cycle while using "package_1" as root?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so basically, providing as_package=True is equivalent to changing all the internal modules paths of the package to just the package name. So in case of package_1 the two following results are almost equivalent (they differ by the module names in the returned cycle):

graph.add_import(importer="package_2.module_b", imported="package_1.package_3.module_c")
graph.add_import(importer="package_1.module_a", imported="package_2.module_b")
assert graph.find_shortest_cycle("package_1", as_package=True) == (
            'package_1.module_a',
            'package_2.module_b',
            'package_1.package_3.module_c'
)
graph.add_import(importer="package_2.module_b", imported="package_1")
graph.add_import(importer="package_1", imported="package_2.module_b")
assert graph.find_shortest_cycle("package_1", as_package=False) == (
            'package_1',
            'package_2.module_b',
            'package_1'
)

And in the case of package_2 we have the following results:
1.

graph.add_import(importer="package_2.module_b", imported="package_1.package_3.module_c")
graph.add_import(importer="package_1.module_a", imported="package_2.module_b")
assert graph.find_shortest_cycle("package_2", as_package=True) == None
graph.add_import(importer="package_2", imported="package_1.package_3.module_c")
graph.add_import(importer="package_1.module_a", imported="package_2")
assert graph.find_shortest_cycle("package_2", as_package=False) == None

We get None, because there is no any module in package_2 having a dependency circle to itself through any package outside of package_2.

Loading