topobench.data.loaders.graph.graph_universe_loader module#

Loaders for GraphUniverse [1] datasets.

[1] “GraphUniverse: Enabling Systematic Evaluation of Inductive Generalization” by Louis Van Langendonck and Guillermo Bernardez and Nina Miolane and Pere Barlet-Ros Accepted at The Fourteenth International Conference on Learning Representations, 2026}, https://openreview.net/forum?id=jRWxvQnqUt

class AbstractLoader(parameters)#

Bases: ABC

Abstract class that provides an interface to load data.

Parameters:

parametersDictConfig: Configuration parameters.

__init__(parameters)#

get_data_dir()#

Get the data directory.

Returns:

Path: The path to the dataset directory.

load(**kwargs)#

Load data.

Parameters:

**kwargsdict: Additional keyword arguments.

Returns:

tuple[torch_geometric.data.Data, str]: Tuple containing the loaded data and the data directory.

abstractmethod load_dataset()#

Load data into a dataset.

Returns:

Union[torch_geometric.data.Dataset, torch.utils.data.Dataset]: The loaded dataset, which could be a PyG or PyTorch dataset.

Raises:

NotImplementedError: If the method is not implemented.

class Data(x=None, edge_index=None, edge_attr=None, y=None, pos=None, time=None, **kwargs)#

Bases: BaseData, FeatureStore, GraphStore

A data object describing a homogeneous graph. The data object can hold node-level, link-level and graph-level attributes. In general, Data tries to mimic the behavior of a regular :python:`Python` dictionary. In addition, it provides useful functionality for analyzing graph structures, and provides basic PyTorch tensor functionalities. See here for the accompanying tutorial.

from torch_geometric.data import Data

data = Data(x=x, edge_index=edge_index, ...)

# Add additional arguments to `data`:
data.train_idx = torch.tensor([...], dtype=torch.long)
data.test_mask = torch.tensor([...], dtype=torch.bool)

# Analyzing the graph structure:
data.num_nodes
>>> 23

data.is_directed()
>>> False

# PyTorch tensor functionality:
data = data.pin_memory()
data = data.to('cuda:0', non_blocking=True)

Parameters:

x (torch.Tensor, optional) – Node feature matrix with shape [num_nodes, num_node_features]. (default: None)
edge_index (LongTensor, optional) – Graph connectivity in COO format with shape [2, num_edges]. (default: None)
edge_attr (torch.Tensor, optional) – Edge feature matrix with shape [num_edges, num_edge_features]. (default: None)
y (torch.Tensor, optional) – Graph-level or node-level ground-truth labels with arbitrary shape. (default: None)
pos (torch.Tensor, optional) – Node position matrix with shape [num_nodes, num_dimensions]. (default: None)
time (torch.Tensor, optional) – The timestamps for each event with shape [num_edges] or [num_nodes]. (default: None)
**kwargs (optional) – Additional attributes.

classmethod from_dict(mapping)#

Creates a Data object from a dictionary.

__init__(x=None, edge_index=None, edge_attr=None, y=None, pos=None, time=None, **kwargs)#

connected_components()#

Extracts connected components of the graph using a union-find algorithm. The components are returned as a list of Data objects, where each object represents a connected component of the graph.

data = Data()
data.x = torch.tensor([[1.0], [2.0], [3.0], [4.0]])
data.y = torch.tensor([[1.1], [2.1], [3.1], [4.1]])
data.edge_index = torch.tensor(
    [[0, 1, 2, 3], [1, 0, 3, 2]], dtype=torch.long
)

components = data.connected_components()
print(len(components))
>>> 2

print(components[0].x)
>>> Data(x=[2, 1], y=[2, 1], edge_index=[2, 2])

Returns:: A list of disconnected components.
Return type:: List[Data]

debug()#

edge_subgraph(subset)#

Returns the induced subgraph given by the edge indices subset. Will currently preserve all the nodes in the graph, even if they are isolated after subgraph computation.

Parameters:: subset (LongTensor or BoolTensor) – The edges to keep.

get_all_edge_attrs()#

Returns all registered edge attributes.

get_all_tensor_attrs()#

Obtains all feature attributes stored in Data.

is_edge_attr(key)#

Returns True if the object at key key denotes an edge-level tensor attribute.

is_node_attr(key)#

Returns True if the object at key key denotes a node-level tensor attribute.

stores_as(data)#

subgraph(subset)#

Returns the induced subgraph given by the node indices subset.

Parameters:: subset (LongTensor or BoolTensor) – The nodes to keep.

to_dict()#

Returns a dictionary of stored key/value pairs.

to_heterogeneous(node_type=None, edge_type=None, node_type_names=None, edge_type_names=None)#

Converts a Data object to a heterogeneous HeteroData object. For this, node and edge attributes are splitted according to the node-level and edge-level vectors node_type and edge_type, respectively. node_type_names and edge_type_names can be used to give meaningful node and edge type names, respectively. That is, the node_type 0 is given by node_type_names[0]. If the Data object was constructed via to_homogeneous(), the object can be reconstructed without any need to pass in additional arguments.

Parameters:

node_type (torch.Tensor, optional) – A node-level vector denoting the type of each node. (default: None)
edge_type (torch.Tensor, optional) – An edge-level vector denoting the type of each edge. (default: None)
node_type_names (List[str], optional) – The names of node types. (default: None)
edge_type_names (List[Tuple[str, str, str]], optional) – The names of edge types. (default: None)

to_namedtuple()#

Returns a NamedTuple of stored key/value pairs.

update(data)#

Updates the data object with the elements from another data object. Added elements will override existing ones (in case of duplicates).

validate(raise_on_error=True)#

Validates the correctness of the data.

property batch: Tensor | None#: !! processed by numpydoc !!

property edge_attr: Tensor | None#: !! processed by numpydoc !!

property edge_index: Tensor | None#: !! processed by numpydoc !!

property edge_stores: List[EdgeStorage]#: !! processed by numpydoc !!

property edge_weight: Tensor | None#: !! processed by numpydoc !!

property face: Tensor | None#: !! processed by numpydoc !!

property node_stores: List[NodeStorage]#: !! processed by numpydoc !!

property num_edge_features: int#: Returns the number of features per edge in the graph.

property num_edge_types: int#: Returns the number of edge types in the graph.

property num_faces: int | None#: Returns the number of faces in the mesh.

property num_features: int#: Returns the number of features per node in the graph. Alias for num_node_features.

property num_node_features: int#: Returns the number of features per node in the graph.

property num_node_types: int#: Returns the number of node types in the graph.

property num_nodes: int | None#: Returns the number of nodes in the graph.

Note

The number of nodes in the data object is automatically inferred in case node-level attributes are present, e.g., data.x. In some cases, however, a graph may only be given without any node-level attributes. :pyg:`PyG` then guesses the number of nodes according to edge_index.max().item() + 1. However, in case there exists isolated nodes, this number does not have to be correct which can result in unexpected behavior. Thus, we recommend to set the number of nodes in your data object explicitly via data.num_nodes = .... You will be given a warning that requests you to do so.

property pos: Tensor | None#: !! processed by numpydoc !!

property stores: List[BaseStorage]#: !! processed by numpydoc !!

property time: Tensor | None#: !! processed by numpydoc !!

property x: Tensor | None#: !! processed by numpydoc !!

property y: Tensor | int | float | None#: !! processed by numpydoc !!

class Dataset(root=None, transform=None, pre_transform=None, pre_filter=None, log=True, force_reload=False)#

Bases: Dataset

Dataset base class for creating graph datasets. See here for the accompanying tutorial.

Parameters:

root (str, optional) – Root directory where the dataset should be saved. (optional: None)
transform (callable, optional) – A function/transform that takes in a Data or HeteroData object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in a Data or HeteroData object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in a Data or HeteroData object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)
log (bool, optional) – Whether to print any console output while downloading and processing the dataset. (default: True)
force_reload (bool, optional) – Whether to re-process the dataset. (default: False)

__init__(root=None, transform=None, pre_transform=None, pre_filter=None, log=True, force_reload=False)#

download()#

Downloads the dataset to the self.raw_dir folder.

get(idx)#

Gets the data object at index idx.

get_summary()#

Collects summary statistics for the dataset.

index_select(idx)#

Creates a subset of the dataset from specified indices idx. Indices idx can be a slicing object, e.g., [2:5], a list, a tuple, or a torch.Tensor or np.ndarray of type long or bool.

indices()#

len()#

Returns the number of data objects stored in the dataset.

print_summary(fmt='psql')#

Prints summary statistics of the dataset to the console.

Parameters:: fmt (str, optional) – Summary tables format. Available table formats can be found here. (default: "psql")

process()#

Processes the dataset to the self.processed_dir folder.

shuffle(return_perm=False)#

Randomly shuffles the examples in the dataset.

Parameters:: return_perm (bool, optional) – If set to True, will also return the random permutation used to shuffle the dataset. (default: False)

to_datapipe()#

Converts the dataset into a torch.utils.data.DataPipe.

The returned instance can then be used with :pyg:`PyG's` built-in DataPipes for batching graphs as follows:

from torch_geometric.datasets import QM9

dp = QM9(root='./data/QM9/').to_datapipe()
dp = dp.batch_graphs(batch_size=2, drop_last=True)

for batch in dp:
    pass

See the PyTorch tutorial for further background on DataPipes.

property has_download: bool#: Checks whether the dataset defines a download() method.

property has_process: bool#: Checks whether the dataset defines a process() method.

property num_classes: int#: Returns the number of classes in the dataset.

property num_edge_features: int#: Returns the number of features per edge in the dataset.

property num_features: int#: Returns the number of features per node in the dataset. Alias for num_node_features.

property num_node_features: int#: Returns the number of features per node in the dataset.

property processed_dir: str#: !! processed by numpydoc !!

property processed_file_names: str | List[str] | Tuple[str, ...]#: The name of the files in the self.processed_dir folder that must be present in order to skip processing.

property processed_paths: List[str]#: The absolute filepaths that must be present in order to skip processing.

property raw_dir: str#: !! processed by numpydoc !!

property raw_file_names: str | List[str] | Tuple[str, ...]#: The name of the files in the self.raw_dir folder that must be present in order to skip downloading.

property raw_paths: List[str]#: The absolute filepaths that must be present in order to skip downloading.

class DictConfig(content, key=None, parent=None, ref_type=typing.Any, key_type=typing.Any, element_type=typing.Any, is_optional=True, flags=None)#

Bases: BaseContainer, MutableMapping[Any, Any]

__init__(content, key=None, parent=None, ref_type=typing.Any, key_type=typing.Any, element_type=typing.Any, is_optional=True, flags=None)#

copy()#

get(key, default_value=None)#

Return the value for key if key is in the dictionary, else default_value (defaulting to None).

items() → a set-like object providing a view on D's items#

items_ex(resolve=True, keys=None)#

keys() → a set-like object providing a view on D's keys#

pop(k[, d]) → v, remove specified key and return the corresponding value.#

If key is not found, d is returned if given, otherwise KeyError is raised.

setdefault(k[, d]) → D.get(k,d), also set D[k]=d if k not in D#

class GraphUniverseDataset(root, parameters, name=None, graph_list=None, **kwargs)#

Bases: InMemoryDataset

Dataset class for GraphUniverse datasets.

Parameters:

rootstr: Root directory where the dataset will be saved.
namestr: Name of the dataset.
parametersDictConfig: Configuration parameters for the dataset.
**kwargsdict: Additional keyword arguments.

__init__(root, parameters, name=None, graph_list=None, **kwargs)#

download()#

Generates the dataset

get_data_dir()#

Return the path to the data directory.

Returns:

str: Path to the data directory.

get_dataset_dir(config)#

Generate a unique dataset directory based on the configuration.

Parameters:: config (dict) – Configuration dictionary.
Returns:: Unique dataset directory.
Return type:: str

process()#

Handle the data for the dataset.

property processed_dir: str#

Return the path to the processed directory of the dataset.

Returns:

str: Path to the processed directory.

property processed_file_names: str#

Return the processed file name for the dataset.

Returns:

str: Processed file name.

property raw_dir: str#

Return the path to the raw directory of the dataset.

Returns:

str: Path to the raw directory.

property raw_file_names: list[str]#

Return the raw file names for the dataset.

Returns:

list[str]: List of raw file names.

class GraphUniverseDatasetLoader(parameters)#

Bases: AbstractLoader

Load Graph Universe datasets.

Parameters:

parametersDictConfig

Configuration parameters containing:

data_dir: Root directory for data
data_name: Name of the dataset
data_type: Type of the dataset (e.g., “graph_classification”)

__init__(parameters)#

load(**kwargs)#

Load data.

Parameters:

**kwargsdict: Additional keyword arguments.

Returns:

tuple[torch_geometric.data.Data, str]: Tuple containing the loaded data and the data directory.

load_dataset()#

Load Graph Universe dataset.

Returns:

Dataset: The loaded Graph Universe dataset.

Raises:

RuntimeError: If dataset loading fails.