topobench.transforms.data_manipulations.ppr_feature_encodings module#

Personalized Page Rank Feature Encoding (PPRFE) Transform.

class BaseTransform#

Bases: ABC

An abstract base class for writing transforms.

Transforms are a general way to modify and customize Data or HeteroData objects, either by implicitly passing them as an argument to a Dataset, or by applying them explicitly to individual Data or HeteroData objects:

import torch_geometric.transforms as T
from torch_geometric.datasets import TUDataset

transform = T.Compose([T.ToUndirected(), T.AddSelfLoops()])

dataset = TUDataset(path, name='MUTAG', transform=transform)
data = dataset[0]  # Implicitly transform data on every access.

data = TUDataset(path, name='MUTAG')[0]
data = transform(data)  # Explicitly transform data.

abstractmethod forward(data)#

class Data(x=None, edge_index=None, edge_attr=None, y=None, pos=None, time=None, **kwargs)#

Bases: BaseData, FeatureStore, GraphStore

A data object describing a homogeneous graph. The data object can hold node-level, link-level and graph-level attributes. In general, Data tries to mimic the behavior of a regular :python:`Python` dictionary. In addition, it provides useful functionality for analyzing graph structures, and provides basic PyTorch tensor functionalities. See here for the accompanying tutorial.

from torch_geometric.data import Data

data = Data(x=x, edge_index=edge_index, ...)

# Add additional arguments to `data`:
data.train_idx = torch.tensor([...], dtype=torch.long)
data.test_mask = torch.tensor([...], dtype=torch.bool)

# Analyzing the graph structure:
data.num_nodes
>>> 23

data.is_directed()
>>> False

# PyTorch tensor functionality:
data = data.pin_memory()
data = data.to('cuda:0', non_blocking=True)

Parameters:

x (torch.Tensor, optional) – Node feature matrix with shape [num_nodes, num_node_features]. (default: None)
edge_index (LongTensor, optional) – Graph connectivity in COO format with shape [2, num_edges]. (default: None)
edge_attr (torch.Tensor, optional) – Edge feature matrix with shape [num_edges, num_edge_features]. (default: None)
y (torch.Tensor, optional) – Graph-level or node-level ground-truth labels with arbitrary shape. (default: None)
pos (torch.Tensor, optional) – Node position matrix with shape [num_nodes, num_dimensions]. (default: None)
time (torch.Tensor, optional) – The timestamps for each event with shape [num_edges] or [num_nodes]. (default: None)
**kwargs (optional) – Additional attributes.

classmethod from_dict(mapping)#

Creates a Data object from a dictionary.

__init__(x=None, edge_index=None, edge_attr=None, y=None, pos=None, time=None, **kwargs)#

connected_components()#

Extracts connected components of the graph using a union-find algorithm. The components are returned as a list of Data objects, where each object represents a connected component of the graph.

data = Data()
data.x = torch.tensor([[1.0], [2.0], [3.0], [4.0]])
data.y = torch.tensor([[1.1], [2.1], [3.1], [4.1]])
data.edge_index = torch.tensor(
    [[0, 1, 2, 3], [1, 0, 3, 2]], dtype=torch.long
)

components = data.connected_components()
print(len(components))
>>> 2

print(components[0].x)
>>> Data(x=[2, 1], y=[2, 1], edge_index=[2, 2])

Returns:: A list of disconnected components.
Return type:: List[Data]

debug()#

edge_subgraph(subset)#

Returns the induced subgraph given by the edge indices subset. Will currently preserve all the nodes in the graph, even if they are isolated after subgraph computation.

Parameters:: subset (LongTensor or BoolTensor) – The edges to keep.

get_all_edge_attrs()#

Returns all registered edge attributes.

get_all_tensor_attrs()#

Obtains all feature attributes stored in Data.

is_edge_attr(key)#

Returns True if the object at key key denotes an edge-level tensor attribute.

is_node_attr(key)#

Returns True if the object at key key denotes a node-level tensor attribute.

stores_as(data)#

subgraph(subset)#

Returns the induced subgraph given by the node indices subset.

Parameters:: subset (LongTensor or BoolTensor) – The nodes to keep.

to_dict()#

Returns a dictionary of stored key/value pairs.

to_heterogeneous(node_type=None, edge_type=None, node_type_names=None, edge_type_names=None)#

Converts a Data object to a heterogeneous HeteroData object. For this, node and edge attributes are splitted according to the node-level and edge-level vectors node_type and edge_type, respectively. node_type_names and edge_type_names can be used to give meaningful node and edge type names, respectively. That is, the node_type 0 is given by node_type_names[0]. If the Data object was constructed via to_homogeneous(), the object can be reconstructed without any need to pass in additional arguments.

Parameters:

node_type (torch.Tensor, optional) – A node-level vector denoting the type of each node. (default: None)
edge_type (torch.Tensor, optional) – An edge-level vector denoting the type of each edge. (default: None)
node_type_names (List[str], optional) – The names of node types. (default: None)
edge_type_names (List[Tuple[str, str, str]], optional) – The names of edge types. (default: None)

to_namedtuple()#

Returns a NamedTuple of stored key/value pairs.

update(data)#

Updates the data object with the elements from another data object. Added elements will override existing ones (in case of duplicates).

validate(raise_on_error=True)#

Validates the correctness of the data.

property batch: Tensor | None#: !! processed by numpydoc !!

property edge_attr: Tensor | None#: !! processed by numpydoc !!

property edge_index: Tensor | None#: !! processed by numpydoc !!

property edge_stores: List[EdgeStorage]#: !! processed by numpydoc !!

property edge_weight: Tensor | None#: !! processed by numpydoc !!

property face: Tensor | None#: !! processed by numpydoc !!

property node_stores: List[NodeStorage]#: !! processed by numpydoc !!

property num_edge_features: int#: Returns the number of features per edge in the graph.

property num_edge_types: int#: Returns the number of edge types in the graph.

property num_faces: int | None#: Returns the number of faces in the mesh.

property num_features: int#: Returns the number of features per node in the graph. Alias for num_node_features.

property num_node_features: int#: Returns the number of features per node in the graph.

property num_node_types: int#: Returns the number of node types in the graph.

property num_nodes: int | None#: Returns the number of nodes in the graph.

Note

The number of nodes in the data object is automatically inferred in case node-level attributes are present, e.g., data.x. In some cases, however, a graph may only be given without any node-level attributes. :pyg:`PyG` then guesses the number of nodes according to edge_index.max().item() + 1. However, in case there exists isolated nodes, this number does not have to be correct which can result in unexpected behavior. Thus, we recommend to set the number of nodes in your data object explicitly via data.num_nodes = .... You will be given a warning that requests you to do so.

property pos: Tensor | None#: !! processed by numpydoc !!

property stores: List[BaseStorage]#: !! processed by numpydoc !!

property time: Tensor | None#: !! processed by numpydoc !!

property x: Tensor | None#: !! processed by numpydoc !!

property y: Tensor | int | float | None#: !! processed by numpydoc !!

class PPRFE(alpha_param_PPRFE, concat_to_x=True, aggregation='mean', self_loop=True, method='approx', appnp_K=20, debug=False, **kwargs)#

Bases: BaseTransform

Personalized Page Rank Feature Encodings (PPRFE) transform.

Parameters:

alpha_param_PPRFEtuple of float: Tuple specifying the start and end teleport probabilities (alpha values).
concat_to_xbool, optional: If True, concatenates the encodings with existing node features. Default is True.
aggregationstr, optional: Aggregation function to reduce over the feature dimension. Options: “mean”, “sum”, “max”, “min”. Default is “mean”.
self_loopbool, optional: If True, adds self-loops to the adjacency matrix. Default is True.
methodstr, optional: Computation method: “exact” or “approx”. Default is “approx”.
appnp_Kint, optional: Number of polynomial expansion terms (propagation steps) for the approx method. Higher means more global information but slower. Default is 20.
debugbool, optional: If True, runs both methods and prints error/timing metrics. Default is False.
**kwargsdict: Additional arguments (not used).

__init__(alpha_param_PPRFE, concat_to_x=True, aggregation='mean', self_loop=True, method='approx', appnp_K=20, debug=False, **kwargs)#

forward(data)#

Compute the PPR feature encodings for the input graph.

Parameters:

datatorch_geometric.data.Data: Input graph data object.

Returns:

torch_geometric.data.Data: Graph data object with PPR feature encodings added.

add_self_loops(edge_index, edge_attr=None, fill_value=None, num_nodes=None)#

Adds a self-loop \((i,i) \in \mathcal{E}\) to every node \(i \in \mathcal{V}\) in the graph given by edge_index. In case the graph is weighted or has multi-dimensional edge features (edge_attr != None), edge features of self-loops will be added according to fill_value.

Parameters:

edge_index (LongTensor) – The edge indices.
edge_attr (Tensor, optional) – Edge weights or multi-dimensional edge features. (default: None)
fill_value (float or Tensor or str, optional) – The way to generate edge features of self-loops (in case edge_attr != None). If given as float or torch.Tensor, edge features of self-loops will be directly given by fill_value. If given as str, edge features of self-loops are computed by aggregating all features of edges that point to the specific node, according to a reduce operation. ("add", "mean", "min", "max", "mul"). (default: 1.)
num_nodes (int or Tuple[int, int], optional) – The number of nodes, i.e. max_val + 1 of edge_index. If given as a tuple, then edge_index is interpreted as a bipartite graph with shape (num_src_nodes, num_dst_nodes). (default: None)

Return type:

(LongTensor, Tensor)

Examples

>>> edge_index = torch.tensor([[0, 1, 0],
...                            [1, 0, 0]])
>>> edge_weight = torch.tensor([0.5, 0.5, 0.5])
>>> add_self_loops(edge_index)
(tensor([[0, 1, 0, 0, 1],
        [1, 0, 0, 0, 1]]),
None)

>>> add_self_loops(edge_index, edge_weight)
(tensor([[0, 1, 0, 0, 1],
        [1, 0, 0, 0, 1]]),
tensor([0.5000, 0.5000, 0.5000, 1.0000, 1.0000]))

>>> # edge features of self-loops are filled by constant `2.0`
>>> add_self_loops(edge_index, edge_weight,
...                fill_value=2.)
(tensor([[0, 1, 0, 0, 1],
        [1, 0, 0, 0, 1]]),
tensor([0.5000, 0.5000, 0.5000, 2.0000, 2.0000]))

>>> # Use 'add' operation to merge edge features for self-loops
>>> add_self_loops(edge_index, edge_weight,
...                fill_value='add')
(tensor([[0, 1, 0, 0, 1],
        [1, 0, 0, 0, 1]]),
tensor([0.5000, 0.5000, 0.5000, 1.0000, 0.5000]))

degree(index, num_nodes=None, dtype=None)#

Computes the (unweighted) degree of a given one-dimensional index tensor.

Parameters:

index (LongTensor) – Index tensor.
num_nodes (int, optional) – The number of nodes, i.e. max_val + 1 of index. (default: None)
dtype (torch.dtype, optional) – The desired data type of the returned tensor.

Return type:

Tensor

Example

>>> row = torch.tensor([0, 1, 0, 2, 0])
>>> degree(row, dtype=torch.long)
tensor([3, 1, 1])

inv(a, overwrite_a=False, check_finite=True, *, assume_a=None, lower=False)#

Compute the inverse of a matrix.

If the data matrix is known to be a particular type then supplying the corresponding string to assume_a key chooses the dedicated solver. The available options are

general	‘general’ (or ‘gen’)
diagonal	‘diagonal’
upper triangular	‘upper triangular’
lower triangular	‘lower triangular’
symmetric positive definite	‘pos’
symmetric	‘sym’
Hermitian	‘her’

For the ‘pos’ option, only the triangle of the input matrix specified in the lower argument is used, and the other triangle is not referenced. Likewise, an explicit assume_a=’diagonal’ means that off-diagonal elements are not referenced.

Array argument(s) of this function may have additional “batch” dimensions prepended to the core shape. In this case, the array is treated as a batch of lower-dimensional slices; see Batched Linear Operations for details.

Parameters:

aarray_like, shape (…, M, M): Square matrix (or a batch of matrices) to be inverted.
overwrite_abool, optional: Discard data in a (may improve performance). Default is False.
check_finitebool, optional: Whether to check that the input matrix contains only finite numbers. Disabling may give a performance gain, but may result in problems (crashes, non-termination) if the inputs do contain infinities or NaNs.
assume_astr, optional: Valid entries are described above. If omitted or None, checks are performed to identify structure so the appropriate solver can be called.
lowerbool, optional: Ignored unless assume_a is one of ‘sym’, ‘her’, or ‘pos’. If True, the calculation uses only the data in the lower triangle of a; entries above the diagonal are ignored. If False (default), the calculation uses only the data in the upper triangle of a; entries below the diagonal are ignored.

Returns:

ainvndarray: Inverse of the matrix a.

Raises:

LinAlgError: If a is singular.
ValueError: If a is not square, or not 2D.

Notes

The input array a may represent a single matrix or a collection (a.k.a. a “batch”) of square matrices. For example, if a.shape == (4, 3, 2, 2), it is interpreted as a (4, 3)-shaped batch of \(2\times 2\) matrices.

This routine checks the condition number of the a matrix and emits a LinAlgWarning for ill-conditioned inputs.

Examples

>>> import numpy as np
>>> from scipy import linalg
>>> a = np.array([[1., 2.], [3., 4.]])
>>> linalg.inv(a)
array([[-2. ,  1. ],
       [ 1.5, -0.5]])
>>> np.dot(a, linalg.inv(a))
array([[ 1.,  0.],
       [ 0.,  1.]])

to_dense_adj(edge_index, batch=None, edge_attr=None, max_num_nodes=None, batch_size=None)#

Converts batched sparse adjacency matrices given by edge indices and edge attributes to a single dense batched adjacency matrix.

Parameters:

edge_index (LongTensor) – The edge indices.
batch (LongTensor, optional) – Batch vector \(\mathbf{b} \in {\{ 0, \ldots, B-1\}}^N\), which assigns each node to a specific example. (default: None)
edge_attr (Tensor, optional) – Edge weights or multi-dimensional edge features. If edge_index contains duplicated edges, the dense adjacency matrix output holds the summed up entries of edge_attr for duplicated edges. (default: None)
max_num_nodes (int, optional) – The size of the output node dimension. (default: None)
batch_size (int, optional) – The batch size. (default: None)

Return type:

Tensor

Examples

>>> edge_index = torch.tensor([[0, 0, 1, 2, 3],
...                            [0, 1, 0, 3, 0]])
>>> batch = torch.tensor([0, 0, 1, 1])
>>> to_dense_adj(edge_index, batch)
tensor([[[1., 1.],
        [1., 0.]],
        [[0., 1.],
        [1., 0.]]])

>>> to_dense_adj(edge_index, batch, max_num_nodes=4)
tensor([[[1., 1., 0., 0.],
        [1., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]],
        [[0., 1., 0., 0.],
        [1., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]]])

>>> edge_attr = torch.tensor([1.0, 2.0, 3.0, 4.0, 5.0])
>>> to_dense_adj(edge_index, batch, edge_attr)
tensor([[[1., 2.],
        [3., 0.]],
        [[0., 4.],
        [5., 0.]]])