topobench.transforms.data_manipulations.hk_feature_encodings module#

Heat Kernel feature Encoding (HKFE) Transform (Debug Version).

class BaseTransform#

Bases: ABC

An abstract base class for writing transforms.

Transforms are a general way to modify and customize Data or HeteroData objects, either by implicitly passing them as an argument to a Dataset, or by applying them explicitly to individual Data or HeteroData objects:

import torch_geometric.transforms as T
from torch_geometric.datasets import TUDataset

transform = T.Compose([T.ToUndirected(), T.AddSelfLoops()])

dataset = TUDataset(path, name='MUTAG', transform=transform)
data = dataset[0]  # Implicitly transform data on every access.

data = TUDataset(path, name='MUTAG')[0]
data = transform(data)  # Explicitly transform data.

abstractmethod forward(data)#

class Data(x=None, edge_index=None, edge_attr=None, y=None, pos=None, time=None, **kwargs)#

Bases: BaseData, FeatureStore, GraphStore

A data object describing a homogeneous graph. The data object can hold node-level, link-level and graph-level attributes. In general, Data tries to mimic the behavior of a regular :python:`Python` dictionary. In addition, it provides useful functionality for analyzing graph structures, and provides basic PyTorch tensor functionalities. See here for the accompanying tutorial.

from torch_geometric.data import Data

data = Data(x=x, edge_index=edge_index, ...)

# Add additional arguments to `data`:
data.train_idx = torch.tensor([...], dtype=torch.long)
data.test_mask = torch.tensor([...], dtype=torch.bool)

# Analyzing the graph structure:
data.num_nodes
>>> 23

data.is_directed()
>>> False

# PyTorch tensor functionality:
data = data.pin_memory()
data = data.to('cuda:0', non_blocking=True)

Parameters:

x (torch.Tensor, optional) – Node feature matrix with shape [num_nodes, num_node_features]. (default: None)
edge_index (LongTensor, optional) – Graph connectivity in COO format with shape [2, num_edges]. (default: None)
edge_attr (torch.Tensor, optional) – Edge feature matrix with shape [num_edges, num_edge_features]. (default: None)
y (torch.Tensor, optional) – Graph-level or node-level ground-truth labels with arbitrary shape. (default: None)
pos (torch.Tensor, optional) – Node position matrix with shape [num_nodes, num_dimensions]. (default: None)
time (torch.Tensor, optional) – The timestamps for each event with shape [num_edges] or [num_nodes]. (default: None)
**kwargs (optional) – Additional attributes.

classmethod from_dict(mapping)#

Creates a Data object from a dictionary.

__init__(x=None, edge_index=None, edge_attr=None, y=None, pos=None, time=None, **kwargs)#

connected_components()#

Extracts connected components of the graph using a union-find algorithm. The components are returned as a list of Data objects, where each object represents a connected component of the graph.

data = Data()
data.x = torch.tensor([[1.0], [2.0], [3.0], [4.0]])
data.y = torch.tensor([[1.1], [2.1], [3.1], [4.1]])
data.edge_index = torch.tensor(
    [[0, 1, 2, 3], [1, 0, 3, 2]], dtype=torch.long
)

components = data.connected_components()
print(len(components))
>>> 2

print(components[0].x)
>>> Data(x=[2, 1], y=[2, 1], edge_index=[2, 2])

Returns:: A list of disconnected components.
Return type:: List[Data]

debug()#

edge_subgraph(subset)#

Returns the induced subgraph given by the edge indices subset. Will currently preserve all the nodes in the graph, even if they are isolated after subgraph computation.

Parameters:: subset (LongTensor or BoolTensor) – The edges to keep.

get_all_edge_attrs()#

Returns all registered edge attributes.

get_all_tensor_attrs()#

Obtains all feature attributes stored in Data.

is_edge_attr(key)#

Returns True if the object at key key denotes an edge-level tensor attribute.

is_node_attr(key)#

Returns True if the object at key key denotes a node-level tensor attribute.

stores_as(data)#

subgraph(subset)#

Returns the induced subgraph given by the node indices subset.

Parameters:: subset (LongTensor or BoolTensor) – The nodes to keep.

to_dict()#

Returns a dictionary of stored key/value pairs.

to_heterogeneous(node_type=None, edge_type=None, node_type_names=None, edge_type_names=None)#

Converts a Data object to a heterogeneous HeteroData object. For this, node and edge attributes are splitted according to the node-level and edge-level vectors node_type and edge_type, respectively. node_type_names and edge_type_names can be used to give meaningful node and edge type names, respectively. That is, the node_type 0 is given by node_type_names[0]. If the Data object was constructed via to_homogeneous(), the object can be reconstructed without any need to pass in additional arguments.

Parameters:

node_type (torch.Tensor, optional) – A node-level vector denoting the type of each node. (default: None)
edge_type (torch.Tensor, optional) – An edge-level vector denoting the type of each edge. (default: None)
node_type_names (List[str], optional) – The names of node types. (default: None)
edge_type_names (List[Tuple[str, str, str]], optional) – The names of edge types. (default: None)

to_namedtuple()#

Returns a NamedTuple of stored key/value pairs.

update(data)#

Updates the data object with the elements from another data object. Added elements will override existing ones (in case of duplicates).

validate(raise_on_error=True)#

Validates the correctness of the data.

property batch: Tensor | None#: !! processed by numpydoc !!

property edge_attr: Tensor | None#: !! processed by numpydoc !!

property edge_index: Tensor | None#: !! processed by numpydoc !!

property edge_stores: List[EdgeStorage]#: !! processed by numpydoc !!

property edge_weight: Tensor | None#: !! processed by numpydoc !!

property face: Tensor | None#: !! processed by numpydoc !!

property node_stores: List[NodeStorage]#: !! processed by numpydoc !!

property num_edge_features: int#: Returns the number of features per edge in the graph.

property num_edge_types: int#: Returns the number of edge types in the graph.

property num_faces: int | None#: Returns the number of faces in the mesh.

property num_features: int#: Returns the number of features per node in the graph. Alias for num_node_features.

property num_node_features: int#: Returns the number of features per node in the graph.

property num_node_types: int#: Returns the number of node types in the graph.

property num_nodes: int | None#: Returns the number of nodes in the graph.

Note

The number of nodes in the data object is automatically inferred in case node-level attributes are present, e.g., data.x. In some cases, however, a graph may only be given without any node-level attributes. :pyg:`PyG` then guesses the number of nodes according to edge_index.max().item() + 1. However, in case there exists isolated nodes, this number does not have to be correct which can result in unexpected behavior. Thus, we recommend to set the number of nodes in your data object explicitly via data.num_nodes = .... You will be given a warning that requests you to do so.

property pos: Tensor | None#: !! processed by numpydoc !!

property stores: List[BaseStorage]#: !! processed by numpydoc !!

property time: Tensor | None#: !! processed by numpydoc !!

property x: Tensor | None#: !! processed by numpydoc !!

property y: Tensor | int | float | None#: !! processed by numpydoc !!

class HKFE(kernel_param_HKFE, concat_to_x=True, aggregation='mean', method='approx', cheb_order=10, debug=False, **kwargs)#

Bases: BaseTransform

Heat Kernel Feature Encodings (HKFE) transform.

Parameters:

kernel_param_HKFEtuple of int: Tuple specifying the start and end diffusion times for the heat kernel.
concat_to_xbool, optional: If True, concatenates encodings with existing node features in data.x. Default is True.
aggregationstr, optional: Aggregation function to reduce over the feature dimension. Options: “mean”, “sum”, “max”, “min”. Default is “mean”.
methodstr, optional: Computation method: “exact” or “approx”. Default is “approx”.
cheb_orderint, optional: The order of the Chebyshev polynomial. Default is 10.
debugbool, optional: If True, runs both exact and approx methods, compares their outputs, and prints the timing and error metrics. Default is False.
**kwargsdict: Additional arguments (not used).

__init__(kernel_param_HKFE, concat_to_x=True, aggregation='mean', method='approx', cheb_order=10, debug=False, **kwargs)#

forward(data)#

Compute the HKFE for the input graph.

Parameters:

datatorch_geometric.data.Data: Input graph data object.

Returns:

torch_geometric.data.Data: Graph data object with HKFE added to data.x or data.HKFE.

expm_multiply(A, B, start=None, stop=None, num=None, endpoint=None, traceA=None)#

Compute the action of the matrix exponential of A on B.

Parameters:

Atransposable linear operator: The operator whose exponential is of interest.
Bndarray, sparse array: The matrix or vector to be multiplied by the matrix exponential of A.
startscalar, optional: The starting time point of the sequence.
stopscalar, optional: The end time point of the sequence, unless endpoint is set to False. In that case, the sequence consists of all but the last of num + 1 evenly spaced time points, so that stop is excluded. Note that the step size changes when endpoint is False.
numint, optional: Number of time points to use.
endpointbool, optional: If True, stop is the last time point. Otherwise, it is not included.
traceAscalar, optional: Trace of A. If not given the trace is estimated for linear operators, or calculated exactly for sparse matrices. It is used to precondition A, thus an approximate trace is acceptable. For linear operators, traceA should be provided to ensure performance as the estimation is not guaranteed to be reliable for all cases.

Added in version 1.9.0.

Returns:

expm_A_Bndarray: The result of the action \(e^{t_k A} B\).

Warns:

UserWarning: If A is a linear operator and traceA=None (default).

Notes

The optional arguments defining the sequence of evenly spaced time points are compatible with the arguments of numpy.linspace.

The output ndarray shape is somewhat complicated so I explain it here. The ndim of the output could be either 1, 2, or 3. It would be 1 if you are computing the expm action on a single vector at a single time point. It would be 2 if you are computing the expm action on a vector at multiple time points, or if you are computing the expm action on a matrix at a single time point. It would be 3 if you want the action on a matrix with multiple columns at multiple time points. If multiple time points are requested, expm_A_B[0] will always be the action of the expm at the first time point, regardless of whether the action is on a vector or a matrix.

References

[1]

Awad H. Al-Mohy and Nicholas J. Higham (2011) “Computing the Action of the Matrix Exponential, with an Application to Exponential Integrators.” SIAM Journal on Scientific Computing, 33 (2). pp. 488-511. ISSN 1064-8275 http://eprints.ma.man.ac.uk/1591/

[2]

Nicholas J. Higham and Awad H. Al-Mohy (2010) “Computing Matrix Functions.” Acta Numerica, 19. 159-208. ISSN 0962-4929 http://eprints.ma.man.ac.uk/1451/

Examples

>>> import numpy as np
>>> from scipy.sparse import csc_array
>>> from scipy.sparse.linalg import expm, expm_multiply
>>> A = csc_array([[1, 0], [0, 1]])
>>> A.toarray()
array([[1, 0],
       [0, 1]], dtype=int64)
>>> B = np.array([np.exp(-1.), np.exp(-2.)])
>>> B
array([ 0.36787944,  0.13533528])
>>> expm_multiply(A, B, start=1, stop=2, num=3, endpoint=True)
array([[ 1.        ,  0.36787944],
       [ 1.64872127,  0.60653066],
       [ 2.71828183,  1.        ]])
>>> expm(A).dot(B)                  # Verify 1st timestep
array([ 1.        ,  0.36787944])
>>> expm(1.5*A).dot(B)              # Verify 2nd timestep
array([ 1.64872127,  0.60653066])
>>> expm(2*A).dot(B)                # Verify 3rd timestep
array([ 2.71828183,  1.        ])

get_laplacian(edge_index, edge_weight=None, normalization=None, dtype=None, num_nodes=None)#

Computes the graph Laplacian of the graph given by edge_index and optional edge_weight.

Parameters:

edge_index (LongTensor) – The edge indices.
edge_weight (Tensor, optional) – One-dimensional edge weights. (default: None)
normalization (str, optional) –
The normalization scheme for the graph Laplacian (default: None):

1. None: No normalization \(\mathbf{L} = \mathbf{D} - \mathbf{A}\)

2. "sym": Symmetric normalization \(\mathbf{L} = \mathbf{I} - \mathbf{D}^{-1/2} \mathbf{A} \mathbf{D}^{-1/2}\)

3. "rw": Random-walk normalization \(\mathbf{L} = \mathbf{I} - \mathbf{D}^{-1} \mathbf{A}\)
dtype (torch.dtype, optional) – The desired data type of returned tensor in case edge_weight=None. (default: None)
num_nodes (int, optional) – The number of nodes, i.e. max_val + 1 of edge_index. (default: None)

Examples

>>> edge_index = torch.tensor([[0, 1, 1, 2],
...                            [1, 0, 2, 1]])
>>> edge_weight = torch.tensor([1., 2., 2., 4.])

>>> # No normalization
>>> lap = get_laplacian(edge_index, edge_weight)

>>> # Symmetric normalization
>>> lap_sym = get_laplacian(edge_index, edge_weight,
                            normalization='sym')

>>> # Random-walk normalization
>>> lap_rw = get_laplacian(edge_index, edge_weight, normalization='rw')

to_scipy_sparse_matrix(edge_index, edge_attr=None, num_nodes=None)#

Converts a graph given by edge indices and edge attributes to a scipy sparse matrix.

Parameters:

edge_index (LongTensor) – The edge indices.
edge_attr (Tensor, optional) – Edge weights or multi-dimensional edge features. (default: None)
num_nodes (int, optional) – The number of nodes, i.e. max_val + 1 of index. (default: None)

Examples

>>> edge_index = torch.tensor([
...     [0, 1, 1, 2, 2, 3],
...     [1, 0, 2, 1, 3, 2],
... ])
>>> to_scipy_sparse_matrix(edge_index)
<4x4 sparse matrix of type '<class 'numpy.float32'>'
    with 6 stored elements in COOrdinate format>