| title | subtitle | icon |
|---|---|---|
SuchTree's API |
A somewhat consice description of the core features of SuchTree. |
material/file-document-outline |
The SuchTree class provides high-performance phylogenetic tree manipulation using Cython. It supports:
- Fast tree traversal and node queries
- Patristic distance calculations
- Topological analysis
- Multiple tree formats (Newick, URL, file path)
- Integration with NetworkX and igraph
Some best practices to keep in mind :
- Use node IDs for performance-critical code
- Prefer bulk methods (
distances_bulk) for multiple calculations - Cache frequently-used properties (like RED values)
- Use traversal generators for memory efficiency
The supporting SuchLinkedTrees class provides several useful bookkeeping features for working with
co-phylogeny datasets, most importantly the ability to perform reciprocal masking of one tree by
clades in its linked tree. Subset masking is thread-safe, fast and non-destructive. However, please
note that SuchTree 1.3 is the last release that will include this interface for SuchLinkedTrees. The
next major release will include a modernized, multi-tree container with a new interface.
class SuchTree(tree_input: Union[str, Path])Construct from:
- Newick string
- File path
- URL (http/https/ftp)
Example:
tree = SuchTree("(A:0.1,B:0.2,(C:0.3,D:0.4));")
tree = SuchTree("https://example.com/tree.newick")| Property | Type | Description | Example |
|---|---|---|---|
size |
int |
Total nodes | tree.size → 7 |
depth |
int |
Max depth | tree.depth → 3 |
num_leaves |
int |
Leaf count | tree.num_leaves → 4 |
root_node |
int |
Root ID | tree.root_node → 0 |
polytomy_epsilon |
float |
Polytomy resolution | 1e-20 |
| Property | Type | Description | Contains |
|---|---|---|---|
leaves |
Dict[str, int] |
Name → ID | {'A': 1, 'B': 2} |
leaf_nodes |
Dict[int, str] |
ID → Name | {1: 'A', 2: 'B'} |
internal_nodes |
np.ndarray |
Internal IDs | [0, 3, 4] |
all_nodes |
np.ndarray |
All IDs | [0, 1, 2, 3, 4, 5, 6] |
leaf_node_ids |
np.ndarray |
Leaf IDs | [1, 2, 5, 6] |
leaf_names |
list |
Leaf names | ['A', 'B', 'C', 'D'] |
Here are some examples of operations you can do with SuchTree.
# Initialize a tree and print some basic properties
tree = SuchTree("(A:0.1,B:0.2,(C:0.3,D:0.4)E:0.5)F;")
print(f"Tree depth: {tree.depth}")
print(f"Leaf names: {tree.leaf_names}")
# Find some node relationships
node_id = tree.leaves['A']
parent_id = tree.get_parent(node_id)
children = tree.get_children(parent_id)
# Distance analysis
dist = tree.distance('A', 'C')
dist_matrix = tree.pairwise_distances(['A', 'B', 'C', 'D'])
# Calculate RED values
red_values = tree.relative_evolutionary_divergence()
# Export tree as a networkx graph (requires networkx)
nx_graph = tree.to_networkx_graph()NodeNotFoundError: Invalid leaf nameInvalidNodeError: Invalid node IDTreeStructureError: Invalid tree operationsValueError: Invalid input formatTypeError: Incorrect argument type
get_parent(node: Union[int, str]) -> intGet immediate parent node for a given node.
Args: node: Node identifier as either integer ID or leaf name string
Returns: Integer ID of parent node. Returns -1 if called on root node.
Raises: NodeNotFoundError: If leaf name doesn't exist in the tree InvalidNodeError: If node ID is out of valid range (0 <= id < tree.size)
Example:
parent_id = tree.get_parent("A")
parent_id = tree.get_parent(5)get_children(node: Union[int, str]) -> Tuple[int, int]Get direct children of a node.
Args: node: Node identifier as either integer ID or leaf name string
Returns: Tuple of (left_child, right_child) node IDs. Returns (-1, -1) for leaf nodes.
Raises: NodeNotFoundError: If leaf name doesn't exist InvalidNodeError: If node ID is invalid
Note:
For multifurcating trees, only the first two children are returned. Use
traverse_children() method for complete child iteration.
get_ancestors(node: Union[int, str]) -> Generator[int, None, None]Generate ancestor node IDs from node to root. Yields parent IDs in ascending order from immediate parent to root.
get_descendants(node_id: int) -> Generator[int, None, None]Generate all descendant node IDs in depth-first order. Includes the starting node in the output.
get_leaves(node: Union[int, str]) -> np.ndarrayGet array of leaf node IDs descended from a given node. Uses efficient buffer reuse for performance.
get_support(node: Union[int, str]) -> floatRetrieve node support value. Returns -1 if no support available. Works for both internal nodes and leaves.
is_leaf(node: Union[int, str]) -> boolCheck if node is a leaf. Uses optimized Cython implementation for fast checking.
is_internal(node: Union[int, str]) -> boolCheck if node is internal. Simply returns negation of is_leaf but provides clearer intent.
is_ancestor(ancestor: Union[int, str], descendant: Union[int, str]) -> intTest ancestral relationship. Returns:
1if ancestor of descendant-1if descendant is ancestor0if no direct relationship
is_descendant(descendant: Union[int, str], ancestor: Union[int, str]) -> boolConvenience method that returns True if descendant is indeed a descendant of ancestor.
is_root(node: Union[int, str]) -> boolCheck if node is the tree root. Uses direct comparison with stored root node ID.
is_sibling(node1: Union[int, str], node2: Union[int, str]) -> boolCheck if two nodes share the same parent. Automatically returns False if either node is root.
has_children(node: Union[int, str]) -> boolDetermine if node has any children. Equivalent to is_internal but may be more intuitive for some users.
has_parent(node: Union[int, str]) -> boolCheck if node has a parent (i.e., is not root). Returns negation of is_root.
common_ancestor(a: Union[int, str], b: Union[int, str]) -> intFind most recent common ancestor of two nodes. Uses optimized MRCA algorithm with visited node tracking.
path_between_nodes(a: Union[int, str], b: Union[int, str]) -> List[int]Get node IDs forming the path between two nodes through their common ancestor. Returns list from a -> MRCA -> b.
SuchTree was originally built to compute patristic (leaf-to-leaf) distances as efficiently as possible. You have two choices to make :
- Do you want to use leaf names or leaf IDs?
- Do you want to to calculate distances for single a pair of leafs, or a table of leaf pairs?
It is important to remember that even if you have two trees with exactly the same
leaf names, the leafs will have different node IDs if the topologies are different.
For those cases, it is better to use use the leaf name to ID mappings (SuchTree.leaves
and SuchTree.leaf_nodes) to associate your leaf nodes with other data.
distance(a: Union[int, str], b: Union[int, str]) -> floatCalculate patristic distance between two nodes along the tree.
Args: a: First node identifier (ID or name) b: Second node identifier (ID or name)
Returns: Sum of branch lengths along the path between nodes via their most recent common ancestor (MRCA)
Raises: NodeNotFoundError: If either node name doesn't exist InvalidNodeError: If either node ID is invalid
Complexity: O(h) where h is the height of the tree. Uses cached ancestor paths for optimal performance.
Example:
dist = tree.distance("A", "B")
dist = tree.distance(2, 5)distances_bulk(pairs: np.ndarray) -> np.ndarrayEfficiently compute distances for multiple node pairs. Accepts (n, 2) array of node IDs. Uses Cython nogil implementation.
distances_by_name(pairs: List[Tuple[str, str]]) -> List[float]Convenience wrapper for bulk distance calculation using leaf names instead of IDs.
pairwise_distances(nodes: list = None) -> np.ndarrayGenerate full distance matrix for specified nodes (all leaves by default). Returns symmetric numpy array.
distance_to_root(node: Union[int, str]) -> floatCalculate total branch length from node to root. Optimized with cumulative distance caching.
nearest_neighbors(node: Union[int, str], k=1) -> List[Tuple[Union[int, str], float]]Find k nearest neighbors to a node. Can search among specific nodes or all leaves by default.
SuchTree includes a collection of generators for traversing the tree in different ways. They all work the same way, other than the different order of traversal.
traverse_inorder(include_distances: bool = True) -> GeneratorIn-order traversal (left, root, right). Yields node IDs or (ID, distance) tuples.
traverse_preorder(from_node: Union[int, str] = None) -> Generator
"""
Iterate through nodes in pre-order traversal (parent before children).
Args:
from_node: Starting node (default: root). Can be ID or name.
Yields:
Node IDs in traversal order
Raises:
NodeNotFoundError: If from_node name doesn't exist
InvalidNodeError: If from_node ID is invalid
Memory:
O(h) space complexity due to stack implementation, where h is tree height
Example:
```python
for node_id in tree.traverse_preorder():
print(f"Visiting node {node_id}")traverse_postorder(from_node: Union[int, str] = None) -> GeneratorPost-order traversal (left, right, root). Useful for dependency resolution.
traverse_levelorder(from_node: Union[int, str] = None) -> GeneratorBreadth-first level order traversal. Yields nodes by depth level.
traverse_leaves_only(from_node: Union[int, str] = None) -> GeneratorEfficient traversal that only yields leaf nodes. Skips internal nodes.
traverse_internal_only(from_node: Union[int, str] = None) -> GeneratorTraversal that skips leaf nodes. Useful for operations only on internal nodes.
traverse_with_depth(from_node: Union[int, str] = None) -> Generator[Tuple[int, int], None, None]Traversal yielding (node ID, depth) pairs. Depth starts at 0 for root.
traverse_with_distances(from_node: Union[int, str] = None) -> Generator[Tuple[int, float, float], None, None]Traversal yielding (node ID, distance to parent, cumulative distance to root).
bipartition(node: Union[int, str], by_id=False) -> frozenset
bipartitions(by_id=False) -> Generator[frozenset, None, None]
quartet_topology(a: Union[int, str], b: Union[int, str], c: Union[int, str], d: Union[int, str]) -> frozenset
quartet_topologies_bulk(quartets: Union[list, np.ndarray]) -> np.ndarray
quartet_topologies_by_name(quartets: List[Tuple[str, str, str, str]]) -> List[frozenset]adjacency_matrix(from_node: Union[int, str] = None) -> Dict[str, Any]
laplacian_matrix(from_node: Union[int, str] = None) -> Dict[str, Any]
incidence_matrix(from_node: Union[int, str] = None) -> Dict[str, Any]
degree_sequence(from_node: Union[int, str] = None) -> Dict[str, Any]to_networkx_graph(from_node: Union[int, str] = None) -> 'networkx.Graph'
to_newick(include_support=True, include_distances=True) -> str
relative_evolutionary_divergence() -> Dict[int, float]