topobench.transforms.data_manipulations.sheaf_connlap_encodings module#

Sheaf Connection Laplacian Positional Encoding (ConnLap) Transform.

class BaseTransform#

Bases: ABC

An abstract base class for writing transforms.

Transforms are a general way to modify and customize Data or HeteroData objects, either by implicitly passing them as an argument to a Dataset, or by applying them explicitly to individual Data or HeteroData objects:

import torch_geometric.transforms as T
from torch_geometric.datasets import TUDataset

transform = T.Compose([T.ToUndirected(), T.AddSelfLoops()])

dataset = TUDataset(path, name='MUTAG', transform=transform)
data = dataset[0]  # Implicitly transform data on every access.

data = TUDataset(path, name='MUTAG')[0]
data = transform(data)  # Explicitly transform data.
abstractmethod forward(data)#
class Data(x=None, edge_index=None, edge_attr=None, y=None, pos=None, time=None, **kwargs)#

Bases: BaseData, FeatureStore, GraphStore

A data object describing a homogeneous graph. The data object can hold node-level, link-level and graph-level attributes. In general, Data tries to mimic the behavior of a regular :python:`Python` dictionary. In addition, it provides useful functionality for analyzing graph structures, and provides basic PyTorch tensor functionalities. See here for the accompanying tutorial.

from torch_geometric.data import Data

data = Data(x=x, edge_index=edge_index, ...)

# Add additional arguments to `data`:
data.train_idx = torch.tensor([...], dtype=torch.long)
data.test_mask = torch.tensor([...], dtype=torch.bool)

# Analyzing the graph structure:
data.num_nodes
>>> 23

data.is_directed()
>>> False

# PyTorch tensor functionality:
data = data.pin_memory()
data = data.to('cuda:0', non_blocking=True)
Parameters:
  • x (torch.Tensor, optional) – Node feature matrix with shape [num_nodes, num_node_features]. (default: None)

  • edge_index (LongTensor, optional) – Graph connectivity in COO format with shape [2, num_edges]. (default: None)

  • edge_attr (torch.Tensor, optional) – Edge feature matrix with shape [num_edges, num_edge_features]. (default: None)

  • y (torch.Tensor, optional) – Graph-level or node-level ground-truth labels with arbitrary shape. (default: None)

  • pos (torch.Tensor, optional) – Node position matrix with shape [num_nodes, num_dimensions]. (default: None)

  • time (torch.Tensor, optional) – The timestamps for each event with shape [num_edges] or [num_nodes]. (default: None)

  • **kwargs (optional) – Additional attributes.

classmethod from_dict(mapping)#

Creates a Data object from a dictionary.

__init__(x=None, edge_index=None, edge_attr=None, y=None, pos=None, time=None, **kwargs)#
connected_components()#

Extracts connected components of the graph using a union-find algorithm. The components are returned as a list of Data objects, where each object represents a connected component of the graph.

data = Data()
data.x = torch.tensor([[1.0], [2.0], [3.0], [4.0]])
data.y = torch.tensor([[1.1], [2.1], [3.1], [4.1]])
data.edge_index = torch.tensor(
    [[0, 1, 2, 3], [1, 0, 3, 2]], dtype=torch.long
)

components = data.connected_components()
print(len(components))
>>> 2

print(components[0].x)
>>> Data(x=[2, 1], y=[2, 1], edge_index=[2, 2])
Returns:

A list of disconnected components.

Return type:

List[Data]

debug()#
edge_subgraph(subset)#

Returns the induced subgraph given by the edge indices subset. Will currently preserve all the nodes in the graph, even if they are isolated after subgraph computation.

Parameters:

subset (LongTensor or BoolTensor) – The edges to keep.

get_all_edge_attrs()#

Returns all registered edge attributes.

get_all_tensor_attrs()#

Obtains all feature attributes stored in Data.

is_edge_attr(key)#

Returns True if the object at key key denotes an edge-level tensor attribute.

is_node_attr(key)#

Returns True if the object at key key denotes a node-level tensor attribute.

stores_as(data)#
subgraph(subset)#

Returns the induced subgraph given by the node indices subset.

Parameters:

subset (LongTensor or BoolTensor) – The nodes to keep.

to_dict()#

Returns a dictionary of stored key/value pairs.

to_heterogeneous(node_type=None, edge_type=None, node_type_names=None, edge_type_names=None)#

Converts a Data object to a heterogeneous HeteroData object. For this, node and edge attributes are splitted according to the node-level and edge-level vectors node_type and edge_type, respectively. node_type_names and edge_type_names can be used to give meaningful node and edge type names, respectively. That is, the node_type 0 is given by node_type_names[0]. If the Data object was constructed via to_homogeneous(), the object can be reconstructed without any need to pass in additional arguments.

Parameters:
  • node_type (torch.Tensor, optional) – A node-level vector denoting the type of each node. (default: None)

  • edge_type (torch.Tensor, optional) – An edge-level vector denoting the type of each edge. (default: None)

  • node_type_names (List[str], optional) – The names of node types. (default: None)

  • edge_type_names (List[Tuple[str, str, str]], optional) – The names of edge types. (default: None)

to_namedtuple()#

Returns a NamedTuple of stored key/value pairs.

update(data)#

Updates the data object with the elements from another data object. Added elements will override existing ones (in case of duplicates).

validate(raise_on_error=True)#

Validates the correctness of the data.

property batch: Tensor | None#

!! processed by numpydoc !!

property edge_attr: Tensor | None#

!! processed by numpydoc !!

property edge_index: Tensor | None#

!! processed by numpydoc !!

property edge_stores: List[EdgeStorage]#

!! processed by numpydoc !!

property edge_weight: Tensor | None#

!! processed by numpydoc !!

property face: Tensor | None#

!! processed by numpydoc !!

property node_stores: List[NodeStorage]#

!! processed by numpydoc !!

property num_edge_features: int#

Returns the number of features per edge in the graph.

property num_edge_types: int#

Returns the number of edge types in the graph.

property num_faces: int | None#

Returns the number of faces in the mesh.

property num_features: int#

Returns the number of features per node in the graph. Alias for num_node_features.

property num_node_features: int#

Returns the number of features per node in the graph.

property num_node_types: int#

Returns the number of node types in the graph.

property num_nodes: int | None#

Returns the number of nodes in the graph.

Note

The number of nodes in the data object is automatically inferred in case node-level attributes are present, e.g., data.x. In some cases, however, a graph may only be given without any node-level attributes. :pyg:`PyG` then guesses the number of nodes according to edge_index.max().item() + 1. However, in case there exists isolated nodes, this number does not have to be correct which can result in unexpected behavior. Thus, we recommend to set the number of nodes in your data object explicitly via data.num_nodes = .... You will be given a warning that requests you to do so.

property pos: Tensor | None#

!! processed by numpydoc !!

property stores: List[BaseStorage]#

!! processed by numpydoc !!

property time: Tensor | None#

!! processed by numpydoc !!

property x: Tensor | None#

!! processed by numpydoc !!

property y: Tensor | int | float | None#

!! processed by numpydoc !!

class SheafConnLapPE(max_pe_dim, stalk_dim=3, include_first=False, concat_to_x=True, eps=1e-06, **kwargs)#

Bases: BaseTransform

Sheaf Connection Laplacian Positional Encoding (SheafConnLapPE) transform.

Based on “Sheaf-based Positional Encodings for Graph Neural Networks” by He, Bodnar & Liò (NeurIPS 2023 Workshop / PMLR 2024). https://openreview.net/pdf?id=ZtAabWUPu3

The Connection Laplacian generalises the standard graph Laplacian by replacing each scalar off-diagonal entry (-1) with a d×d orthogonal restriction map — a rotation encoding the geometric alignment between the local node-feature neighbourhoods of the two endpoints.

For each edge (v, u), the algorithm:

  1. Runs local PCA on the 1-hop feature neighbourhood of v and u separately, yielding orthonormal bases B_v, B_u ∈ R^{p×d} that approximate the local tangent spaces T_{x_v}M and T_{x_u}M under the manifold assumption.

  2. Solves the orthogonal Procrustes problem to find the rotation O_{vu} ∈ O(d) that best maps B_v onto B_u (closed form: SVD of B_v^T B_u).

  3. Sets the off-diagonal block L_F[v, u] = -O_{vu}.

The resulting nd×nd block matrix L_F is symmetric positive semi-definite. Its k smallest non-trivial eigenvectors (each reshaped from nd to n×d) are concatenated column-wise to form a PE of total dimension k×d per node.

On homophilic edges (similar features) O_{vu} ≈ I and the Connection Laplacian closely resembles the standard Laplacian. On heterophilic edges O_{vu} is a non-trivial rotation, introducing cross-dimensional coupling that encodes semantic disagreement — information the standard Laplacian cannot represent.

Note

Feature dimension requirement : data.x must be present and data.x.shape[1] >= stalk_dim. The method assumes that node features lie near a stalk_dim-dimensional manifold; if feature_dim < stalk_dim this assumption is violated and the PCA basis would contain zero columns, making the Procrustes rotation degenerate and breaking the PSD property of L_F. A ValueError is raised in this case.

Isolated nodes : For isolated nodes (degree 0), the diagonal block of L_F is the zero matrix, making D^{-1/2} undefined. The normalisation substitutes 1.0 for these zero diagonal entries, which is equivalent to adding a unit self-loop for numerical purposes. The eigenvector components of isolated nodes are still well-defined (the corresponding rows of L_F remain all-zero), but their PE values will reflect their position in the global spectrum rather than local connectivity.

Parameters:
max_pe_dimint

Total output PE dimension. Must be divisible by stalk_dim. Internally, the number of eigenvectors used is k = max_pe_dim // stalk_dim, so the output shape is always [num_nodes, max_pe_dim] (zero-padded if fewer eigenvectors are available).

stalk_dimint, optional

Dimension d of each stalk / restriction map. Controls the rank of the local tangent-space approximation. Default is 3, as used in the paper experiments. Must be <= feature_dim of data.x.

include_firstbool, optional

If False (default), discards eigenvectors whose eigenvalue is below eps (the trivial zero-eigenvectors / global sections of the sheaf).

concat_to_xbool, optional

If True (default), concatenates the PE with data.x. If False, stores it as data.SheafConnLapPE instead.

epsfloat, optional

Threshold below which eigenvalues are considered trivial. Default 1e-6.

**kwargs

Additional keyword arguments (unused; reserved for future extensions).

__init__(max_pe_dim, stalk_dim=3, include_first=False, concat_to_x=True, eps=1e-06, **kwargs)#
forward(data)#

Compute and attach the ConnLap PE to a graph data object.

Parameters:
dataData

Input graph. data.x must be set and data.x.shape[1] >= stalk_dim.

Returns:
Data

Graph with PE concatenated to data.x (concat_to_x=True) or stored in data.SheafConnLapPE (concat_to_x=False).

Raises:
ValueError

If data.x is None, or if feature_dim < stalk_dim.

class coo_matrix(arg1, shape=None, dtype=None, copy=False, *, maxprint=None)#

Bases: spmatrix, _coo_base

A sparse matrix in COOrdinate format.

Also known as the ‘ijv’ or ‘triplet’ format.

This can be instantiated in several ways:
coo_matrix(D)

where D is a 2-D ndarray

coo_matrix(S)

with another sparse array or matrix S (equivalent to S.tocoo())

coo_matrix((M, N), [dtype])

to construct an empty matrix with shape (M, N) dtype is optional, defaulting to dtype=’d’.

coo_matrix((data, (i, j)), [shape=(M, N)])
to construct from three arrays:
  1. data[:] the entries of the matrix, in any order

  2. i[:] the row indices of the matrix entries

  3. j[:] the column indices of the matrix entries

Where A[i[k], j[k]] = data[k]. When shape is not specified, it is inferred from the index arrays

Attributes:
dtypedtype

Data type of the matrix

shape2-tuple

Shape of the matrix

ndimint

Number of dimensions (this is always 2)

nnz

Number of stored values, including explicit zeros.

size

Number of stored values.

data

COO format data array of the matrix

row

COO format row index array of the matrix

col

COO format column index array of the matrix

has_canonical_formatbool

Whether the matrix has sorted indices and no duplicates

format

Format string for matrix.

T

Transpose.

Notes

Sparse matrices can be used in arithmetic operations: they support addition, subtraction, multiplication, division, and matrix power.

Advantages of the COO format
  • facilitates fast conversion among sparse formats

  • permits duplicate entries (see example)

  • very fast conversion to and from CSR/CSC formats

Disadvantages of the COO format
  • does not directly support:
    • arithmetic operations

    • slicing

Intended Usage
  • COO is a fast format for constructing sparse matrices

  • Once a COO matrix has been constructed, convert to CSR or CSC format for fast arithmetic and matrix vector operations

  • By default when converting to CSR or CSC format, duplicate (i,j) entries will be summed together. This facilitates efficient construction of finite element matrices and the like. (see example)

Canonical format
  • Entries and coordinates sorted by row, then column.

  • There are no duplicate entries (i.e. duplicate (i,j) locations)

  • Data arrays MAY have explicit zeros.

Examples

>>> # Constructing an empty matrix
>>> import numpy as np
>>> from scipy.sparse import coo_matrix
>>> coo_matrix((3, 4), dtype=np.int8).toarray()
array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]], dtype=int8)
>>> # Constructing a matrix using ijv format
>>> row  = np.array([0, 3, 1, 0])
>>> col  = np.array([0, 3, 1, 2])
>>> data = np.array([4, 5, 7, 9])
>>> coo_matrix((data, (row, col)), shape=(4, 4)).toarray()
array([[4, 0, 9, 0],
       [0, 7, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 5]])
>>> # Constructing a matrix with duplicate coordinates
>>> row  = np.array([0, 0, 1, 3, 1, 0, 0])
>>> col  = np.array([0, 2, 1, 3, 1, 0, 0])
>>> data = np.array([1, 1, 1, 1, 1, 1, 1])
>>> coo = coo_matrix((data, (row, col)), shape=(4, 4))
>>> # Duplicate coordinates are maintained until implicitly or explicitly summed
>>> np.max(coo.data)
1
>>> coo.toarray()
array([[3, 0, 1, 0],
       [0, 2, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 1]])
degree(index, num_nodes=None, dtype=None)#

Computes the (unweighted) degree of a given one-dimensional index tensor.

Parameters:
  • index (LongTensor) – Index tensor.

  • num_nodes (int, optional) – The number of nodes, i.e. max_val + 1 of index. (default: None)

  • dtype (torch.dtype, optional) – The desired data type of the returned tensor.

Return type:

Tensor

Example

>>> row = torch.tensor([0, 1, 0, 2, 0])
>>> degree(row, dtype=torch.long)
tensor([3, 1, 1])
remove_self_loops(edge_index, edge_attr=None)#

Removes every self-loop in the graph given by edge_index, so that \((i,i) \not\in \mathcal{E}\) for every \(i \in \mathcal{V}\).

Parameters:
  • edge_index (LongTensor) – The edge indices.

  • edge_attr (Tensor, optional) – Edge weights or multi-dimensional edge features. (default: None)

Return type:

(LongTensor, Tensor)

Example

>>> edge_index = torch.tensor([[0, 1, 0],
...                            [1, 0, 0]])
>>> edge_attr = [[1, 2], [3, 4], [5, 6]]
>>> edge_attr = torch.tensor(edge_attr)
>>> remove_self_loops(edge_index, edge_attr)
(tensor([[0, 1],
        [1, 0]]),
tensor([[1, 2],
        [3, 4]]))
sp_diags(diagonals, offsets=0, shape=None, format=None, dtype=<object object>)#

Construct a sparse matrix from diagonals.

Warning

This function returns a sparse matrix – not a sparse array. You are encouraged to use diags_array to take advantage of the sparse array functionality.

Parameters:
diagonalssequence of array_like

Sequence of arrays containing the matrix diagonals, corresponding to offsets.

offsetssequence of int or an int, optional
Diagonals to set (repeated offsets are not allowed):
  • k = 0 the main diagonal (default)

  • k > 0 the kth upper diagonal

  • k < 0 the kth lower diagonal

shapetuple of int, optional

Shape of the result. If omitted, a square matrix large enough to contain the diagonals is returned.

format{“dia”, “csr”, “csc”, “lil”, …}, optional

Matrix format of the result. By default (format=None) an appropriate sparse matrix format is returned. This choice is subject to change.

dtypedtype, optional

Data type of the matrix. If dtype is None, the output data type is determined by the data type of the input diagonals.

Up until SciPy 1.19, the default behavior will be to return a matrix with an inexact (floating point) data type. In particular, integer input will be converted to double precision floating point. This behavior is deprecated, and in SciPy 1.19, the default behavior will be changed to return a matrix with the same data type as the input diagonals. To adopt this behavior before version 1.19, use dtype=None.

Returns:
new_matrixdia_matrix

dia_matrix holding the values in diagonals offset from the main diagonal as indicated in offsets.

See also

spdiags

construct matrix from diagonals

diags_array

construct sparse array instead of sparse matrix

Notes

Repeated diagonal offsets are disallowed.

The result from diags is the sparse equivalent of:

np.diag(diagonals[0], offsets[0])
+ ...
+ np.diag(diagonals[k], offsets[k])

diags differs from dia_matrix in the way it handles off-diagonals. Specifically, dia_matrix assumes the data input includes padding (ignored values) at the start/end of the rows for positive/negative offset, while diags assumes the input data has no padding. Each value in the input diagonals is used.

Added in version 0.11.

Examples

>>> from scipy.sparse import diags
>>> diagonals = [[1.0, 2.0, 3.0, 4.0], [1.0, 2.0, 3.0], [1.0, 2.0]]
>>> diags(diagonals, [0, -1, 2]).toarray()
array([[1., 0., 1., 0.],
       [1., 2., 0., 2.],
       [0., 2., 3., 0.],
       [0., 0., 3., 4.]])

Broadcasting of scalars is supported (but shape needs to be specified):

>>> diags([1.0, -2.0, 1.0], [-1, 0, 1], shape=(4, 4)).toarray()
array([[-2.,  1.,  0.,  0.],
       [ 1., -2.,  1.,  0.],
       [ 0.,  1., -2.,  1.],
       [ 0.,  0.,  1., -2.]])

If only one diagonal is wanted (as in numpy.diag), the following works as well:

>>> diags([1.0, 2.0, 3.0], 1).toarray()
array([[ 0.,  1.,  0.,  0.],
       [ 0.,  0.,  2.,  0.],
       [ 0.,  0.,  0.,  3.],
       [ 0.,  0.,  0.,  0.]])
svd(a, full_matrices=True, compute_uv=True, overwrite_a=False, check_finite=True, lapack_driver='gesdd')#

Singular Value Decomposition.

Factorizes the matrix a into two unitary matrices U and Vh, and a 1-D array s of singular values (real, non-negative) such that a == U @ S @ Vh, where S is a suitably shaped matrix of zeros with main diagonal s.

The documentation is written assuming array arguments are of specified “core” shapes. However, array argument(s) of this function may have additional “batch” dimensions prepended to the core shape. In this case, the array is treated as a batch of lower-dimensional slices; see Batched Linear Operations for details. Note that calls with zero-size batches are unsupported and will raise a ValueError.

Parameters:
a(M, N) array_like

Matrix to decompose.

full_matricesbool, optional

If True (default), U and Vh are of shape (M, M), (N, N). If False, the shapes are (M, K) and (K, N), where K = min(M, N).

compute_uvbool, optional

Whether to compute also U and Vh in addition to s. Default is True.

overwrite_abool, optional

Whether to overwrite a; may improve performance. Default is False.

check_finitebool, optional

Whether to check that the input matrix contains only finite numbers. Disabling may give a performance gain, but may result in problems (crashes, non-termination) if the inputs do contain infinities or NaNs.

lapack_driver{‘gesdd’, ‘gesvd’}, optional

Whether to use the more efficient divide-and-conquer approach ('gesdd') or general rectangular approach ('gesvd') to compute the SVD. MATLAB and Octave use the 'gesvd' approach. Default is 'gesdd'.

Returns:
Undarray

Unitary matrix having left singular vectors as columns. Of shape (M, M) or (M, K), depending on full_matrices.

sndarray

The singular values, sorted in non-increasing order. Of shape (K,), with K = min(M, N).

Vhndarray

Unitary matrix having right singular vectors as rows. Of shape (N, N) or (K, N) depending on full_matrices.

For compute_uv=False, only s is returned.
Raises:
LinAlgError

If SVD computation does not converge.

See also

svdvals()

Compute singular values of a matrix.

diagsvd()

Construct the Sigma matrix, given the vector s.

Examples

>>> import numpy as np
>>> from scipy import linalg
>>> rng = np.random.default_rng()
>>> m, n = 9, 6
>>> a = rng.standard_normal((m, n)) + 1.j*rng.standard_normal((m, n))
>>> U, s, Vh = linalg.svd(a)
>>> U.shape,  s.shape, Vh.shape
((9, 9), (6,), (6, 6))

Reconstruct the original matrix from the decomposition:

>>> sigma = np.zeros((m, n))
>>> for i in range(min(m, n)):
...     sigma[i, i] = s[i]
>>> a1 = np.dot(U, np.dot(sigma, Vh))
>>> np.allclose(a, a1)
True

Alternatively, use full_matrices=False (notice that the shape of U is then (m, n) instead of (m, m)):

>>> U, s, Vh = linalg.svd(a, full_matrices=False)
>>> U.shape, s.shape, Vh.shape
((9, 6), (6,), (6, 6))
>>> S = np.diag(s)
>>> np.allclose(a, np.dot(U, np.dot(S, Vh)))
True
>>> s2 = linalg.svd(a, compute_uv=False)
>>> np.allclose(s, s2)
True
to_undirected(edge_index, edge_attr='???', num_nodes=None, reduce='add')#

Converts the graph given by edge_index to an undirected graph such that \((j,i) \in \mathcal{E}\) for every edge \((i,j) \in \mathcal{E}\).

Parameters:
  • edge_index (LongTensor) – The edge indices.

  • edge_attr (Tensor or List[Tensor], optional) – Edge weights or multi- dimensional edge features. If given as a list, will remove duplicates for all its entries. (default: None)

  • num_nodes (int, optional) – The number of nodes, i.e. max(edge_index) + 1. (default: None)

  • reduce (str, optional) – The reduce operation to use for merging edge features ("add", "mean", "min", "max", "mul"). (default: "add")

Return type:

LongTensor if edge_attr is not passed, else (LongTensor, Optional[Tensor] or List[Tensor]])

Warning

From :pyg:`PyG >= 2.3.0` onwards, this function will always return a tuple whenever edge_attr is passed as an argument (even in case it is set to None).

Examples

>>> edge_index = torch.tensor([[0, 1, 1],
...                            [1, 0, 2]])
>>> to_undirected(edge_index)
tensor([[0, 1, 1, 2],
        [1, 0, 2, 1]])
>>> edge_index = torch.tensor([[0, 1, 1],
...                            [1, 0, 2]])
>>> edge_weight = torch.tensor([1., 1., 1.])
>>> to_undirected(edge_index, edge_weight)
(tensor([[0, 1, 1, 2],
        [1, 0, 2, 1]]),
tensor([2., 2., 1., 1.]))
>>> # Use 'mean' operation to merge edge features
>>>  to_undirected(edge_index, edge_weight, reduce='mean')
(tensor([[0, 1, 1, 2],
        [1, 0, 2, 1]]),
tensor([1., 1., 1., 1.]))