topobench.data.loaders package#

Init file for load module.

class ADMEDatasetLoader(parameters)#

Bases: AbstractLoader

Load TDC ADME datasets with SMILES to graph conversion using OGB featurization.

This loader: 1. Loads ADME datasets from TDC (Therapeutics Data Commons) 2. Converts SMILES strings to PyG graphs using OGB’s standard featurization 3. Uses fixed scaffold splits from TDC 4. Returns graphs compatible with OGB molecular property prediction

Node features (9-dimensional):
  • Atomic number

  • Chirality

  • Degree

  • Formal charge

  • Number of hydrogens

  • Number of radical electrons

  • Hybridization

  • Is aromatic

  • Is in ring

Edge features (3-dimensional):
  • Bond type

  • Bond stereochemistry

  • Is conjugated

Parameters:
parametersDictConfig
Configuration parameters containing:
  • data_dir: Root directory for data

  • data_name: Name of the ADME dataset

  • data_type: Type of the dataset (e.g., “ADME”)

__init__(parameters)#
get_data_dir()#

Get the data directory.

Returns:
Path

The path to the dataset directory. Format: {root_data_dir}/{dataset_name}/. Example: data/graph/ADME/BBB_Martins/.

load_dataset()#

Load the ADME dataset with predefined scaffold splits.

Returns:
InMemoryDataset

The dataset with converted graphs and predefined splits.

Raises:
RuntimeError

If dataset loading or SMILES conversion fails.

ValueError

If invalid SMILES strings are encountered.

ImportError

If PyTDC or rdkit (via ogb) are not installed.

class AbstractLoader(parameters)#

Bases: ABC

Abstract class that provides an interface to load data.

Parameters:
parametersDictConfig

Configuration parameters.

__init__(parameters)#
get_data_dir()#

Get the data directory.

Returns:
Path

The path to the dataset directory.

load(**kwargs)#

Load data.

Parameters:
**kwargsdict

Additional keyword arguments.

Returns:
tuple[torch_geometric.data.Data, str]

Tuple containing the loaded data and the data directory.

abstractmethod load_dataset()#

Load data into a dataset.

Returns:
Union[torch_geometric.data.Dataset, torch.utils.data.Dataset]

The loaded dataset, which could be a PyG or PyTorch dataset.

Raises:
NotImplementedError

If the method is not implemented.

class CitationHypergraphDatasetLoader(parameters)#

Bases: AbstractLoader

Load Citation Hypergraph dataset with configurable parameters.

Parameters:
parametersDictConfig
Configuration parameters containing:
  • data_dir: Root directory for data

  • data_name: Name of the dataset

  • other relevant parameters

__init__(parameters)#
load_dataset()#

Load the Citation Hypergraph dataset.

Returns:
CitationHypergraphDataset

The loaded Citation Hypergraph dataset with the appropriate data_dir.

Raises:
RuntimeError

If dataset loading fails.

class GeometricShapesDatasetLoader(parameters)#

Bases: AbstractLoader

Load GeometricShapes dataset.

Parameters:
parametersDictConfig
Configuration parameters containing:
  • data_dir: Root directory for data

__init__(parameters)#
load_dataset()#

Load GeometricShapes dataset.

Returns:
Dataset

The loaded GeometricShapes dataset.

Raises:
RuntimeError

If dataset loading fails.

class GraphUniverseDatasetLoader(parameters)#

Bases: AbstractLoader

Load Graph Universe datasets.

Parameters:
parametersDictConfig
Configuration parameters containing:
  • data_dir: Root directory for data

  • data_name: Name of the dataset

  • data_type: Type of the dataset (e.g., “graph_classification”)

__init__(parameters)#
load(**kwargs)#

Load data.

Parameters:
**kwargsdict

Additional keyword arguments.

Returns:
tuple[torch_geometric.data.Data, str]

Tuple containing the loaded data and the data directory.

load_dataset()#

Load Graph Universe dataset.

Returns:
Dataset

The loaded Graph Universe dataset.

Raises:
RuntimeError

If dataset loading fails.

class HeterophilousGraphDatasetLoader(parameters)#

Bases: AbstractLoader

Load Heterophilous Graph datasets.

Parameters:
parametersDictConfig
Configuration parameters containing:
  • data_dir: Root directory for data

  • data_name: Name of the dataset

  • data_type: Type of the dataset (e.g., “heterophilous”)

__init__(parameters)#
load_dataset()#

Load Heterophilous Graph dataset.

Returns:
Dataset

The loaded Heterophilous Graph dataset.

Raises:
RuntimeError

If dataset loading fails.

class HypergraphDatasetLoader(parameters)#

Bases: AbstractLoader

Load Citation Hypergraph dataset with configurable parameters.

Parameters:
parametersDictConfig
Configuration parameters containing:
  • data_dir: Root directory for data

  • data_name: Name of the dataset

  • other relevant parameters

__init__(parameters)#
load_dataset()#

Load the Citation Hypergraph dataset.

Returns:
HypergraphDataset

The loaded Citation Hypergraph dataset with the appropriate data_dir.

Raises:
RuntimeError

If dataset loading fails.

class MantraSimplicialDatasetLoader(parameters, **kwargs)#

Bases: AbstractLoader

Load Mantra dataset with configurable parameters.

Note: for the simplicial datasets it is necessary to include DatasetLoader into the name of the class!

Parameters:
parametersDictConfig
Configuration parameters containing:
  • data_dir: Root directory for data

  • data_name: Name of the dataset

  • other relevant parameters

**kwargsdict

Additional keyword arguments.

__init__(parameters, **kwargs)#
load(**kwargs)#

Load the Mantra dataset.

Parameters:
**kwargsdict

Additional keyword arguments for dataset initialization.

Returns:
MantraDataset

The loaded Mantra dataset with the appropriate data_dir.

Raises:
RuntimeError

If dataset loading fails.

load_dataset(**kwargs)#

Initialize the Mantra dataset.

Parameters:
**kwargsdict

Additional keyword arguments for dataset initialization.

Returns:
MantraDataset

The initialized dataset instance.

class ManualGraphDatasetLoader(parameters)#

Bases: AbstractLoader

Load manually provided graph datasets.

Parameters:
parametersDictConfig
Configuration parameters containing:
  • data_name: Name of the dataset

  • data_dir: Root directory for data

__init__(parameters)#
get_data_dir()#

Get the data directory.

Returns:
Path

The path to the dataset directory.

load_dataset()#

Load the manual graph dataset.

Returns:
DataloadDataset

The dataset object containing the manually loaded graph.

class MoleculeDatasetLoader(parameters)#

Bases: AbstractLoader

Load molecule datasets (ZINC and AQSOL) with predefined splits, or QM9.

Parameters:
parametersDictConfig
Configuration parameters containing:
  • data_dir: Root directory for data

  • data_name: Name of the dataset

  • data_type: Type of the dataset (e.g., “molecule”)

  • qm9_target_index: (QM9 only) Which of the 19 regression targets to use (default 0).

__init__(parameters)#
get_data_dir()#

Get the data directory.

Returns:
Path

The path to the dataset directory.

load_dataset()#

Load the molecule dataset with predefined splits.

Returns:
Dataset

The combined dataset with predefined splits.

Raises:
RuntimeError

If dataset loading fails.

class OGBGDatasetLoader(parameters)#

Bases: AbstractLoader

Load molecule datasets (molhiv, molpcba, ppa) with predefined splits.

Parameters:
parametersDictConfig
Configuration parameters containing:
  • data_dir: Root directory for data

  • data_name: Name of the dataset

  • data_type: Type of the dataset (e.g., “molecule”)

__init__(parameters)#
get_data_dir()#

Get the data directory.

Returns:
Path

The path to the dataset directory.

load_dataset()#

Load the molecule dataset with predefined splits.

Returns:
Dataset

The combined dataset with predefined splits.

Raises:
RuntimeError

If dataset loading fails.

class PlanetoidDatasetLoader(parameters)#

Bases: AbstractLoader

Load PLANETOID datasets.

Parameters:
parametersDictConfig
Configuration parameters containing:
  • data_dir: Root directory for data

  • data_name: Name of the dataset

  • data_type: Type of the dataset (e.g., “cocitation”)

__init__(parameters)#
load_dataset()#

Load Planetoid dataset.

Returns:
Dataset

The loaded Planetoid dataset.

Raises:
RuntimeError

If dataset loading fails.

class TUDatasetLoader(parameters)#

Bases: AbstractLoader

Load TU datasets.

Parameters:
parametersDictConfig
Configuration parameters containing:
  • data_dir: Root directory for data

  • data_name: Name of the dataset

  • data_type: Type of the dataset (e.g., “graph_classification”)

__init__(parameters)#
load_dataset()#

Load TU dataset.

Returns:
Dataset

The loaded TU dataset.

Raises:
RuntimeError

If dataset loading fails.

class USCountyDemosDatasetLoader(parameters)#

Bases: AbstractLoader

Load US County Demos dataset with configurable year and task variable.

Parameters:
parametersDictConfig
Configuration parameters containing:
  • data_dir: Root directory for data

  • data_name: Name of the dataset

  • year: Year of the dataset (if applicable)

  • task_variable: Task variable for the dataset

__init__(parameters)#
load_dataset()#

Load the US County Demos dataset.

Returns:
USCountyDemosDataset

The loaded US County Demos dataset with the appropriate data_dir.

Raises:
RuntimeError

If dataset loading fails.

Subpackages#

Submodules#