topobench.transforms.data_manipulations.laplacian_encodings module#

Laplacian Positional Encoding (LapPE) Transform.

class BaseTransform#

Bases: ABC

An abstract base class for writing transforms.

Transforms are a general way to modify and customize Data or HeteroData objects, either by implicitly passing them as an argument to a Dataset, or by applying them explicitly to individual Data or HeteroData objects:

import torch_geometric.transforms as T
from torch_geometric.datasets import TUDataset

transform = T.Compose([T.ToUndirected(), T.AddSelfLoops()])

dataset = TUDataset(path, name='MUTAG', transform=transform)
data = dataset[0]  # Implicitly transform data on every access.

data = TUDataset(path, name='MUTAG')[0]
data = transform(data)  # Explicitly transform data.
abstract forward(data)#
class Data(x=None, edge_index=None, edge_attr=None, y=None, pos=None, time=None, **kwargs)#

Bases: BaseData, FeatureStore, GraphStore

A data object describing a homogeneous graph. The data object can hold node-level, link-level and graph-level attributes. In general, Data tries to mimic the behavior of a regular :python:`Python` dictionary. In addition, it provides useful functionality for analyzing graph structures, and provides basic PyTorch tensor functionalities. See here for the accompanying tutorial.

from torch_geometric.data import Data

data = Data(x=x, edge_index=edge_index, ...)

# Add additional arguments to `data`:
data.train_idx = torch.tensor([...], dtype=torch.long)
data.test_mask = torch.tensor([...], dtype=torch.bool)

# Analyzing the graph structure:
data.num_nodes
>>> 23

data.is_directed()
>>> False

# PyTorch tensor functionality:
data = data.pin_memory()
data = data.to('cuda:0', non_blocking=True)
Parameters:
  • x (torch.Tensor, optional) – Node feature matrix with shape [num_nodes, num_node_features]. (default: None)

  • edge_index (LongTensor, optional) – Graph connectivity in COO format with shape [2, num_edges]. (default: None)

  • edge_attr (torch.Tensor, optional) – Edge feature matrix with shape [num_edges, num_edge_features]. (default: None)

  • y (torch.Tensor, optional) – Graph-level or node-level ground-truth labels with arbitrary shape. (default: None)

  • pos (torch.Tensor, optional) – Node position matrix with shape [num_nodes, num_dimensions]. (default: None)

  • time (torch.Tensor, optional) – The timestamps for each event with shape [num_edges] or [num_nodes]. (default: None)

  • **kwargs (optional) – Additional attributes.

__init__(x=None, edge_index=None, edge_attr=None, y=None, pos=None, time=None, **kwargs)#
connected_components()#

Extracts connected components of the graph using a union-find algorithm. The components are returned as a list of Data objects, where each object represents a connected component of the graph.

data = Data()
data.x = torch.tensor([[1.0], [2.0], [3.0], [4.0]])
data.y = torch.tensor([[1.1], [2.1], [3.1], [4.1]])
data.edge_index = torch.tensor(
    [[0, 1, 2, 3], [1, 0, 3, 2]], dtype=torch.long
)

components = data.connected_components()
print(len(components))
>>> 2

print(components[0].x)
>>> Data(x=[2, 1], y=[2, 1], edge_index=[2, 2])
Returns:

A list of disconnected components.

Return type:

List[Data]

debug()#
edge_subgraph(subset)#

Returns the induced subgraph given by the edge indices subset. Will currently preserve all the nodes in the graph, even if they are isolated after subgraph computation.

Parameters:

subset (LongTensor or BoolTensor) – The edges to keep.

classmethod from_dict(mapping)#

Creates a Data object from a dictionary.

get_all_edge_attrs()#

Returns all registered edge attributes.

get_all_tensor_attrs()#

Obtains all feature attributes stored in Data.

is_edge_attr(key)#

Returns True if the object at key key denotes an edge-level tensor attribute.

is_node_attr(key)#

Returns True if the object at key key denotes a node-level tensor attribute.

stores_as(data)#
subgraph(subset)#

Returns the induced subgraph given by the node indices subset.

Parameters:

subset (LongTensor or BoolTensor) – The nodes to keep.

to_dict()#

Returns a dictionary of stored key/value pairs.

to_heterogeneous(node_type=None, edge_type=None, node_type_names=None, edge_type_names=None)#

Converts a Data object to a heterogeneous HeteroData object. For this, node and edge attributes are splitted according to the node-level and edge-level vectors node_type and edge_type, respectively. node_type_names and edge_type_names can be used to give meaningful node and edge type names, respectively. That is, the node_type 0 is given by node_type_names[0]. If the Data object was constructed via to_homogeneous(), the object can be reconstructed without any need to pass in additional arguments.

Parameters:
  • node_type (torch.Tensor, optional) – A node-level vector denoting the type of each node. (default: None)

  • edge_type (torch.Tensor, optional) – An edge-level vector denoting the type of each edge. (default: None)

  • node_type_names (List[str], optional) – The names of node types. (default: None)

  • edge_type_names (List[Tuple[str, str, str]], optional) – The names of edge types. (default: None)

to_namedtuple()#

Returns a NamedTuple of stored key/value pairs.

update(data)#

Updates the data object with the elements from another data object. Added elements will override existing ones (in case of duplicates).

validate(raise_on_error=True)#

Validates the correctness of the data.

property batch: Tensor | None#

!! processed by numpydoc !!

property edge_attr: Tensor | None#

!! processed by numpydoc !!

property edge_index: Tensor | None#

!! processed by numpydoc !!

property edge_stores: List[EdgeStorage]#

!! processed by numpydoc !!

property edge_weight: Tensor | None#

!! processed by numpydoc !!

property face: Tensor | None#

!! processed by numpydoc !!

property node_stores: List[NodeStorage]#

!! processed by numpydoc !!

property num_edge_features: int#

Returns the number of features per edge in the graph.

property num_edge_types: int#

Returns the number of edge types in the graph.

property num_faces: int | None#

Returns the number of faces in the mesh.

property num_features: int#

Returns the number of features per node in the graph. Alias for num_node_features.

property num_node_features: int#

Returns the number of features per node in the graph.

property num_node_types: int#

Returns the number of node types in the graph.

property num_nodes: int | None#

Returns the number of nodes in the graph.

Note

The number of nodes in the data object is automatically inferred in case node-level attributes are present, e.g., data.x. In some cases, however, a graph may only be given without any node-level attributes. :pyg:`PyG` then guesses the number of nodes according to edge_index.max().item() + 1. However, in case there exists isolated nodes, this number does not have to be correct which can result in unexpected behavior. Thus, we recommend to set the number of nodes in your data object explicitly via data.num_nodes = .... You will be given a warning that requests you to do so.

property pos: Tensor | None#

!! processed by numpydoc !!

property stores: List[BaseStorage]#

!! processed by numpydoc !!

property time: Tensor | None#

!! processed by numpydoc !!

property x: Tensor | None#

!! processed by numpydoc !!

property y: Tensor | int | float | None#

!! processed by numpydoc !!

class LapPE(max_pe_dim, include_eigenvalues=False, include_first=False, concat_to_x=True, **kwargs)#

Bases: BaseTransform

Laplacian Positional Encoding (LapPE) transform.

This computes the smallest eigenvectors of the normalized Laplacian matrix and appends them as node features (structural encodings). Optionally pads to a fixed dimension.

Parameters:
max_pe_dimint

Maximum number of eigenvectors to use (dimensionality of the encoding).

include_eigenvaluesbool, optional

If True, concatenates eigenvalues alongside eigenvectors. Shape then becomes [num_nodes, 2 * max_pe_dim]. Default is False.

include_firstbool, optional

If False, removes eigenvectors corresponding to (near-)zero eigenvalues (constant eigenvector in connected graphs). Default is False.

concat_to_xbool, optional

If True, concatenates the encodings with existing node features in data.x. If data.x is None, creates it. Default is True.

**kwargsdict

Additional arguments (not used).

__init__(max_pe_dim, include_eigenvalues=False, include_first=False, concat_to_x=True, **kwargs)#
forward(data)#

Compute the Laplacian positional encodings for the input graph.

Parameters:
dataData

Input graph data object.

Returns:
Data

Graph data object with Laplacian positional encodings added.

eigsh(A, k=6, M=None, sigma=None, which='LM', v0=None, ncv=None, maxiter=None, tol=0, return_eigenvectors=True, Minv=None, OPinv=None, mode='normal')#

Find k eigenvalues and eigenvectors of the real symmetric square matrix or complex Hermitian matrix A.

Solves A @ x[i] = w[i] * x[i], the standard eigenvalue problem for w[i] eigenvalues with corresponding eigenvectors x[i].

If M is specified, solves A @ x[i] = w[i] * M @ x[i], the generalized eigenvalue problem for w[i] eigenvalues with corresponding eigenvectors x[i].

Note that there is no specialized routine for the case when A is a complex Hermitian matrix. In this case, eigsh() will call eigs() and return the real parts of the eigenvalues thus obtained.

Parameters:
Andarray, sparse matrix or LinearOperator

A square operator representing the operation A @ x, where A is real symmetric or complex Hermitian. For buckling mode (see below) A must additionally be positive-definite.

kint, optional

The number of eigenvalues and eigenvectors desired. k must be smaller than N. It is not possible to compute all eigenvectors of a matrix.

Returns:
warray

Array of k eigenvalues.

varray

An array representing the k eigenvectors. The column v[:, i] is the eigenvector corresponding to the eigenvalue w[i].

Other Parameters:
MAn N x N matrix, array, sparse matrix, or linear operator representing

the operation M @ x for the generalized eigenvalue problem

A @ x = w * M @ x.

M must represent a real symmetric matrix if A is real, and must represent a complex Hermitian matrix if A is complex. For best results, the data type of M should be the same as that of A. Additionally:

If sigma is None, M is symmetric positive definite.

If sigma is specified, M is symmetric positive semi-definite.

In buckling mode, M is symmetric indefinite.

If sigma is None, eigsh requires an operator to compute the solution of the linear equation M @ x = b. This is done internally via a (sparse) LU decomposition for an explicit matrix M, or via an iterative solver for a general linear operator. Alternatively, the user can supply the matrix or operator Minv, which gives x = Minv @ b = M^-1 @ b.

sigmareal

Find eigenvalues near sigma using shift-invert mode. This requires an operator to compute the solution of the linear system [A - sigma * M] x = b, where M is the identity matrix if unspecified. This is computed internally via a (sparse) LU decomposition for explicit matrices A & M, or via an iterative solver if either A or M is a general linear operator. Alternatively, the user can supply the matrix or operator OPinv, which gives x = OPinv @ b = [A - sigma * M]^-1 @ b. Regardless of the selected mode (normal, cayley, or buckling), OPinv should always be supplied as OPinv = [A - sigma * M]^-1.

Note that when sigma is specified, the keyword ‘which’ refers to the shifted eigenvalues w'[i] where:

if mode == 'normal': w'[i] = 1 / (w[i] - sigma).

if mode == 'cayley': w'[i] = (w[i] + sigma) / (w[i] - sigma).

if mode == 'buckling': w'[i] = w[i] / (w[i] - sigma).

(see further discussion in ‘mode’ below)

v0ndarray, optional

Starting vector for iteration. Default: random

ncvint, optional

The number of Lanczos vectors generated ncv must be greater than k and smaller than n; it is recommended that ncv > 2*k. Default: min(n, max(2*k + 1, 20))

whichstr [‘LM’ | ‘SM’ | ‘LA’ | ‘SA’ | ‘BE’]

If A is a complex Hermitian matrix, ‘BE’ is invalid. Which k eigenvectors and eigenvalues to find:

‘LM’ : Largest (in magnitude) eigenvalues.

‘SM’ : Smallest (in magnitude) eigenvalues.

‘LA’ : Largest (algebraic) eigenvalues.

‘SA’ : Smallest (algebraic) eigenvalues.

‘BE’ : Half (k/2) from each end of the spectrum.

When k is odd, return one more (k/2+1) from the high end. When sigma != None, ‘which’ refers to the shifted eigenvalues w'[i] (see discussion in ‘sigma’, above). ARPACK is generally better at finding large values than small values. If small eigenvalues are desired, consider using shift-invert mode for better performance.

maxiterint, optional

Maximum number of Arnoldi update iterations allowed. Default: n*10

tolfloat

Relative accuracy for eigenvalues (stopping criterion). The default value of 0 implies machine precision.

MinvN x N matrix, array, sparse matrix, or LinearOperator

See notes in M, above.

OPinvN x N matrix, array, sparse matrix, or LinearOperator

See notes in sigma, above.

return_eigenvectorsbool

Return eigenvectors (True) in addition to eigenvalues. This value determines the order in which eigenvalues are sorted. The sort order is also dependent on the which variable.

For which = ‘LM’ or ‘SA’:

If return_eigenvectors is True, eigenvalues are sorted by algebraic value.

If return_eigenvectors is False, eigenvalues are sorted by absolute value.

For which = ‘BE’ or ‘LA’:

eigenvalues are always sorted by algebraic value.

For which = ‘SM’:

If return_eigenvectors is True, eigenvalues are sorted by algebraic value.

If return_eigenvectors is False, eigenvalues are sorted by decreasing absolute value.

modestring [‘normal’ | ‘buckling’ | ‘cayley’]

Specify strategy to use for shift-invert mode. This argument applies only for real-valued A and sigma != None. For shift-invert mode, ARPACK internally solves the eigenvalue problem OP @ x'[i] = w'[i] * B @ x'[i] and transforms the resulting Ritz vectors x’[i] and Ritz values w’[i] into the desired eigenvectors and eigenvalues of the problem A @ x[i] = w[i] * M @ x[i]. The modes are as follows:

‘normal’ :

OP = [A - sigma * M]^-1 @ M, B = M, w’[i] = 1 / (w[i] - sigma)

‘buckling’ :

OP = [A - sigma * M]^-1 @ A, B = A, w’[i] = w[i] / (w[i] - sigma)

‘cayley’ :

OP = [A - sigma * M]^-1 @ [A + sigma * M], B = M, w’[i] = (w[i] + sigma) / (w[i] - sigma)

The choice of mode will affect which eigenvalues are selected by the keyword ‘which’, and can also impact the stability of convergence (see [2] for a discussion).

Raises:
ArpackNoConvergence

When the requested convergence is not obtained.

The currently converged eigenvalues and eigenvectors can be found as eigenvalues and eigenvectors attributes of the exception object.

See also

eigs

eigenvalues and eigenvectors for a general (nonsymmetric) matrix A

svds

singular value decomposition for a matrix A

Notes

This function is a wrapper to the ARPACK [1] SSEUPD and DSEUPD functions which use the Implicitly Restarted Lanczos Method to find the eigenvalues and eigenvectors [2].

References

[1]

ARPACK Software, opencollab/arpack-ng

[2]

R. B. Lehoucq, D. C. Sorensen, and C. Yang, ARPACK USERS GUIDE: Solution of Large Scale Eigenvalue Problems by Implicitly Restarted Arnoldi Methods. SIAM, Philadelphia, PA, 1998.

Examples

>>> import numpy as np
>>> from scipy.sparse.linalg import eigsh
>>> identity = np.eye(13)
>>> eigenvalues, eigenvectors = eigsh(identity, k=6)
>>> eigenvalues
array([1., 1., 1., 1., 1., 1.])
>>> eigenvectors.shape
(13, 6)
get_laplacian(edge_index, edge_weight=None, normalization=None, dtype=None, num_nodes=None)#

Computes the graph Laplacian of the graph given by edge_index and optional edge_weight.

Parameters:
  • edge_index (LongTensor) – The edge indices.

  • edge_weight (Tensor, optional) – One-dimensional edge weights. (default: None)

  • normalization (str, optional) –

    The normalization scheme for the graph Laplacian (default: None):

    1. None: No normalization \(\mathbf{L} = \mathbf{D} - \mathbf{A}\)

    2. "sym": Symmetric normalization \(\mathbf{L} = \mathbf{I} - \mathbf{D}^{-1/2} \mathbf{A} \mathbf{D}^{-1/2}\)

    3. "rw": Random-walk normalization \(\mathbf{L} = \mathbf{I} - \mathbf{D}^{-1} \mathbf{A}\)

  • dtype (torch.dtype, optional) – The desired data type of returned tensor in case edge_weight=None. (default: None)

  • num_nodes (int, optional) – The number of nodes, i.e. max_val + 1 of edge_index. (default: None)

Examples

>>> edge_index = torch.tensor([[0, 1, 1, 2],
...                            [1, 0, 2, 1]])
>>> edge_weight = torch.tensor([1., 2., 2., 4.])
>>> # No normalization
>>> lap = get_laplacian(edge_index, edge_weight)
>>> # Symmetric normalization
>>> lap_sym = get_laplacian(edge_index, edge_weight,
                            normalization='sym')
>>> # Random-walk normalization
>>> lap_rw = get_laplacian(edge_index, edge_weight, normalization='rw')
to_scipy_sparse_matrix(edge_index, edge_attr=None, num_nodes=None)#

Converts a graph given by edge indices and edge attributes to a scipy sparse matrix.

Parameters:
  • edge_index (LongTensor) – The edge indices.

  • edge_attr (Tensor, optional) – Edge weights or multi-dimensional edge features. (default: None)

  • num_nodes (int, optional) – The number of nodes, i.e. max_val + 1 of index. (default: None)

Examples

>>> edge_index = torch.tensor([
...     [0, 1, 1, 2, 2, 3],
...     [1, 0, 2, 1, 3, 2],
... ])
>>> to_scipy_sparse_matrix(edge_index)
<4x4 sparse matrix of type '<class 'numpy.float32'>'
    with 6 stored elements in COOrdinate format>