topobench.data.loaders.graph package#
Init file for graph load module with automated loader discovery.
- class ADMEDatasetLoader(parameters)#
Bases:
AbstractLoaderLoad TDC ADME datasets with SMILES to graph conversion using OGB featurization.
This loader: 1. Loads ADME datasets from TDC (Therapeutics Data Commons) 2. Converts SMILES strings to PyG graphs using OGB’s standard featurization 3. Uses fixed scaffold splits from TDC 4. Returns graphs compatible with OGB molecular property prediction
- Node features (9-dimensional):
Atomic number
Chirality
Degree
Formal charge
Number of hydrogens
Number of radical electrons
Hybridization
Is aromatic
Is in ring
- Edge features (3-dimensional):
Bond type
Bond stereochemistry
Is conjugated
- Parameters:
- parametersDictConfig
- Configuration parameters containing:
data_dir: Root directory for data
data_name: Name of the ADME dataset
data_type: Type of the dataset (e.g., “ADME”)
- __init__(parameters)#
- get_data_dir()#
Get the data directory.
- Returns:
- Path
The path to the dataset directory. Format: {root_data_dir}/{dataset_name}/. Example: data/graph/ADME/BBB_Martins/.
- load_dataset()#
Load the ADME dataset with predefined scaffold splits.
- Returns:
- InMemoryDataset
The dataset with converted graphs and predefined splits.
- Raises:
- RuntimeError
If dataset loading or SMILES conversion fails.
- ValueError
If invalid SMILES strings are encountered.
- ImportError
If PyTDC or rdkit (via ogb) are not installed.
- class GraphUniverseDatasetLoader(parameters)#
Bases:
AbstractLoaderLoad Graph Universe datasets.
- Parameters:
- parametersDictConfig
- Configuration parameters containing:
data_dir: Root directory for data
data_name: Name of the dataset
data_type: Type of the dataset (e.g., “graph_classification”)
- __init__(parameters)#
- load(**kwargs)#
Load data.
- Parameters:
- **kwargsdict
Additional keyword arguments.
- Returns:
- tuple[torch_geometric.data.Data, str]
Tuple containing the loaded data and the data directory.
- load_dataset()#
Load Graph Universe dataset.
- Returns:
- Dataset
The loaded Graph Universe dataset.
- Raises:
- RuntimeError
If dataset loading fails.
- class HeterophilousGraphDatasetLoader(parameters)#
Bases:
AbstractLoaderLoad Heterophilous Graph datasets.
- Parameters:
- parametersDictConfig
- Configuration parameters containing:
data_dir: Root directory for data
data_name: Name of the dataset
data_type: Type of the dataset (e.g., “heterophilous”)
- __init__(parameters)#
- load_dataset()#
Load Heterophilous Graph dataset.
- Returns:
- Dataset
The loaded Heterophilous Graph dataset.
- Raises:
- RuntimeError
If dataset loading fails.
- class MantraSimplicialDatasetLoader(parameters, **kwargs)#
Bases:
AbstractLoaderLoad Mantra dataset with configurable parameters.
Note: for the simplicial datasets it is necessary to include DatasetLoader into the name of the class!
- Parameters:
- parametersDictConfig
- Configuration parameters containing:
data_dir: Root directory for data
data_name: Name of the dataset
other relevant parameters
- **kwargsdict
Additional keyword arguments.
- __init__(parameters, **kwargs)#
- load_dataset(**kwargs)#
Load the MANTRA dataset.
- Parameters:
- **kwargsdict
Additional keyword arguments for dataset initialization.
- Returns:
- CitationHypergraphDataset
The loaded Citation Hypergraph dataset with the appropriate data_dir.
- Raises:
- RuntimeError
If dataset loading fails.
- class ManualGraphDatasetLoader(parameters)#
Bases:
AbstractLoaderLoad manually provided graph datasets.
- Parameters:
- parametersDictConfig
- Configuration parameters containing:
data_name: Name of the dataset
data_dir: Root directory for data
- __init__(parameters)#
- get_data_dir()#
Get the data directory.
- Returns:
- Path
The path to the dataset directory.
- load_dataset()#
Load the manual graph dataset.
- Returns:
- DataloadDataset
The dataset object containing the manually loaded graph.
- class MoleculeDatasetLoader(parameters)#
Bases:
AbstractLoaderLoad molecule datasets (ZINC and AQSOL) with predefined splits, or QM9.
- Parameters:
- parametersDictConfig
- Configuration parameters containing:
data_dir: Root directory for data
data_name: Name of the dataset
data_type: Type of the dataset (e.g., “molecule”)
qm9_target_index: (QM9 only) Which of the 19 regression targets to use (default 0).
- __init__(parameters)#
- get_data_dir()#
Get the data directory.
- Returns:
- Path
The path to the dataset directory.
- load_dataset()#
Load the molecule dataset with predefined splits.
- Returns:
- Dataset
The combined dataset with predefined splits.
- Raises:
- RuntimeError
If dataset loading fails.
- class OGBGDatasetLoader(parameters)#
Bases:
AbstractLoaderLoad molecule datasets (molhiv, molpcba, ppa) with predefined splits.
- Parameters:
- parametersDictConfig
- Configuration parameters containing:
data_dir: Root directory for data
data_name: Name of the dataset
data_type: Type of the dataset (e.g., “molecule”)
- __init__(parameters)#
- get_data_dir()#
Get the data directory.
- Returns:
- Path
The path to the dataset directory.
- load_dataset()#
Load the molecule dataset with predefined splits.
- Returns:
- Dataset
The combined dataset with predefined splits.
- Raises:
- RuntimeError
If dataset loading fails.
- class PlanetoidDatasetLoader(parameters)#
Bases:
AbstractLoaderLoad PLANETOID datasets.
- Parameters:
- parametersDictConfig
- Configuration parameters containing:
data_dir: Root directory for data
data_name: Name of the dataset
data_type: Type of the dataset (e.g., “cocitation”)
- __init__(parameters)#
- load_dataset()#
Load Planetoid dataset.
- Returns:
- Dataset
The loaded Planetoid dataset.
- Raises:
- RuntimeError
If dataset loading fails.
- class TUDatasetLoader(parameters)#
Bases:
AbstractLoaderLoad TU datasets.
- Parameters:
- parametersDictConfig
- Configuration parameters containing:
data_dir: Root directory for data
data_name: Name of the dataset
data_type: Type of the dataset (e.g., “graph_classification”)
- __init__(parameters)#
- load_dataset()#
Load TU dataset.
- Returns:
- Dataset
The loaded TU dataset.
- Raises:
- RuntimeError
If dataset loading fails.
- class USCountyDemosDatasetLoader(parameters)#
Bases:
AbstractLoaderLoad US County Demos dataset with configurable year and task variable.
- Parameters:
- parametersDictConfig
- Configuration parameters containing:
data_dir: Root directory for data
data_name: Name of the dataset
year: Year of the dataset (if applicable)
task_variable: Task variable for the dataset
- __init__(parameters)#
- load_dataset()#
Load the US County Demos dataset.
- Returns:
- USCountyDemosDataset
The loaded US County Demos dataset with the appropriate data_dir.
- Raises:
- RuntimeError
If dataset loading fails.
Submodules#
- topobench.data.loaders.graph.adme_datasets module
ADMEADMEDatasetLoaderAbstractLoaderDataData.from_dict()Data.__init__()Data.connected_components()Data.debug()Data.edge_subgraph()Data.get_all_edge_attrs()Data.get_all_tensor_attrs()Data.is_edge_attr()Data.is_node_attr()Data.stores_as()Data.subgraph()Data.to_dict()Data.to_heterogeneous()Data.to_namedtuple()Data.update()Data.validate()Data.batchData.edge_attrData.edge_indexData.edge_storesData.edge_weightData.faceData.node_storesData.num_edge_featuresData.num_edge_typesData.num_facesData.num_featuresData.num_node_featuresData.num_node_typesData.num_nodesData.posData.storesData.timeData.xData.y
DictConfigInMemoryDatasetInMemoryDataset.save()InMemoryDataset.collate()InMemoryDataset.__init__()InMemoryDataset.copy()InMemoryDataset.cpu()InMemoryDataset.cuda()InMemoryDataset.get()InMemoryDataset.len()InMemoryDataset.load()InMemoryDataset.to()InMemoryDataset.to_on_disk_dataset()InMemoryDataset.dataInMemoryDataset.num_classesInMemoryDataset.processed_file_namesInMemoryDataset.raw_file_names
PathPath.cwd()Path.home()Path.absolute()Path.chmod()Path.exists()Path.expanduser()Path.glob()Path.group()Path.hardlink_to()Path.is_block_device()Path.is_char_device()Path.is_dir()Path.is_fifo()Path.is_file()Path.is_mount()Path.is_socket()Path.is_symlink()Path.iterdir()Path.lchmod()Path.link_to()Path.lstat()Path.mkdir()Path.open()Path.owner()Path.read_bytes()Path.read_text()Path.readlink()Path.rename()Path.replace()Path.resolve()Path.rglob()Path.rmdir()Path.samefile()Path.stat()Path.symlink_to()Path.touch()Path.unlink()Path.write_bytes()Path.write_text()
smiles2graph()
- topobench.data.loaders.graph.graph_universe_loader module
AbstractLoaderDataData.from_dict()Data.__init__()Data.connected_components()Data.debug()Data.edge_subgraph()Data.get_all_edge_attrs()Data.get_all_tensor_attrs()Data.is_edge_attr()Data.is_node_attr()Data.stores_as()Data.subgraph()Data.to_dict()Data.to_heterogeneous()Data.to_namedtuple()Data.update()Data.validate()Data.batchData.edge_attrData.edge_indexData.edge_storesData.edge_weightData.faceData.node_storesData.num_edge_featuresData.num_edge_typesData.num_facesData.num_featuresData.num_node_featuresData.num_node_typesData.num_nodesData.posData.storesData.timeData.xData.y
DatasetDataset.__init__()Dataset.download()Dataset.get()Dataset.get_summary()Dataset.index_select()Dataset.indices()Dataset.len()Dataset.print_summary()Dataset.process()Dataset.shuffle()Dataset.to_datapipe()Dataset.has_downloadDataset.has_processDataset.num_classesDataset.num_edge_featuresDataset.num_featuresDataset.num_node_featuresDataset.processed_dirDataset.processed_file_namesDataset.processed_pathsDataset.raw_dirDataset.raw_file_namesDataset.raw_paths
DictConfigGraphUniverseDatasetGraphUniverseDataset.__init__()GraphUniverseDataset.download()GraphUniverseDataset.get_data_dir()GraphUniverseDataset.get_dataset_dir()GraphUniverseDataset.process()GraphUniverseDataset.processed_dirGraphUniverseDataset.processed_file_namesGraphUniverseDataset.raw_dirGraphUniverseDataset.raw_file_names
GraphUniverseDatasetLoader
- topobench.data.loaders.graph.hetero_datasets module
AbstractLoaderDatasetDataset.__init__()Dataset.download()Dataset.get()Dataset.get_summary()Dataset.index_select()Dataset.indices()Dataset.len()Dataset.print_summary()Dataset.process()Dataset.shuffle()Dataset.to_datapipe()Dataset.has_downloadDataset.has_processDataset.num_classesDataset.num_edge_featuresDataset.num_featuresDataset.num_node_featuresDataset.processed_dirDataset.processed_file_namesDataset.processed_pathsDataset.raw_dirDataset.raw_file_namesDataset.raw_paths
DictConfigHeterophilousGraphDatasetHeterophilousGraphDataset.__init__()HeterophilousGraphDataset.download()HeterophilousGraphDataset.process()HeterophilousGraphDataset.processed_dirHeterophilousGraphDataset.processed_file_namesHeterophilousGraphDataset.raw_dirHeterophilousGraphDataset.raw_file_namesHeterophilousGraphDataset.url
HeterophilousGraphDatasetLoader
- topobench.data.loaders.graph.mantra_dataset module
- topobench.data.loaders.graph.manual_graph_dataset_loader module
AbstractLoaderAnyDataloadDatasetDictConfigManualGraphDatasetLoaderPathPath.cwd()Path.home()Path.absolute()Path.chmod()Path.exists()Path.expanduser()Path.glob()Path.group()Path.hardlink_to()Path.is_block_device()Path.is_char_device()Path.is_dir()Path.is_fifo()Path.is_file()Path.is_mount()Path.is_socket()Path.is_symlink()Path.iterdir()Path.lchmod()Path.link_to()Path.lstat()Path.mkdir()Path.open()Path.owner()Path.read_bytes()Path.read_text()Path.readlink()Path.rename()Path.replace()Path.resolve()Path.rglob()Path.rmdir()Path.samefile()Path.stat()Path.symlink_to()Path.touch()Path.unlink()Path.write_bytes()Path.write_text()
load_manual_graph()
- topobench.data.loaders.graph.molecule_datasets module
AQSOLAbstractLoaderDatasetDataset.__init__()Dataset.download()Dataset.get()Dataset.get_summary()Dataset.index_select()Dataset.indices()Dataset.len()Dataset.print_summary()Dataset.process()Dataset.shuffle()Dataset.to_datapipe()Dataset.has_downloadDataset.has_processDataset.num_classesDataset.num_edge_featuresDataset.num_featuresDataset.num_node_featuresDataset.processed_dirDataset.processed_file_namesDataset.processed_pathsDataset.raw_dirDataset.raw_file_namesDataset.raw_paths
DictConfigMoleculeDatasetLoaderOmegaConfOmegaConf.clear_resolver()OmegaConf.has_resolver()OmegaConf.clear_cache()OmegaConf.clear_resolvers()OmegaConf.copy_cache()OmegaConf.create()OmegaConf.from_cli()OmegaConf.from_dotlist()OmegaConf.get_cache()OmegaConf.get_type()OmegaConf.is_config()OmegaConf.is_dict()OmegaConf.is_interpolation()OmegaConf.is_list()OmegaConf.is_missing()OmegaConf.is_readonly()OmegaConf.is_struct()OmegaConf.legacy_register_resolver()OmegaConf.load()OmegaConf.masked_copy()OmegaConf.merge()OmegaConf.missing_keys()OmegaConf.register_new_resolver()OmegaConf.register_resolver()OmegaConf.resolve()OmegaConf.save()OmegaConf.select()OmegaConf.set_cache()OmegaConf.set_readonly()OmegaConf.set_struct()OmegaConf.structured()OmegaConf.to_container()OmegaConf.to_object()OmegaConf.to_yaml()OmegaConf.unsafe_merge()OmegaConf.update()OmegaConf.__init__()
PathPath.cwd()Path.home()Path.absolute()Path.chmod()Path.exists()Path.expanduser()Path.glob()Path.group()Path.hardlink_to()Path.is_block_device()Path.is_char_device()Path.is_dir()Path.is_fifo()Path.is_file()Path.is_mount()Path.is_socket()Path.is_symlink()Path.iterdir()Path.lchmod()Path.link_to()Path.lstat()Path.mkdir()Path.open()Path.owner()Path.read_bytes()Path.read_text()Path.readlink()Path.rename()Path.replace()Path.resolve()Path.rglob()Path.rmdir()Path.samefile()Path.stat()Path.symlink_to()Path.touch()Path.unlink()Path.write_bytes()Path.write_text()
QM9ZINC
- topobench.data.loaders.graph.ogbg_datasets module
AbstractLoaderDatasetDataset.__init__()Dataset.download()Dataset.get()Dataset.get_summary()Dataset.index_select()Dataset.indices()Dataset.len()Dataset.print_summary()Dataset.process()Dataset.shuffle()Dataset.to_datapipe()Dataset.has_downloadDataset.has_processDataset.num_classesDataset.num_edge_featuresDataset.num_featuresDataset.num_node_featuresDataset.processed_dirDataset.processed_file_namesDataset.processed_pathsDataset.raw_dirDataset.raw_file_namesDataset.raw_paths
DictConfigOGBGDatasetLoaderPathPath.cwd()Path.home()Path.absolute()Path.chmod()Path.exists()Path.expanduser()Path.glob()Path.group()Path.hardlink_to()Path.is_block_device()Path.is_char_device()Path.is_dir()Path.is_fifo()Path.is_file()Path.is_mount()Path.is_socket()Path.is_symlink()Path.iterdir()Path.lchmod()Path.link_to()Path.lstat()Path.mkdir()Path.open()Path.owner()Path.read_bytes()Path.read_text()Path.readlink()Path.rename()Path.replace()Path.resolve()Path.rglob()Path.rmdir()Path.samefile()Path.stat()Path.symlink_to()Path.touch()Path.unlink()Path.write_bytes()Path.write_text()
PygGraphPropPredDataset
- topobench.data.loaders.graph.planetoid_datasets module
AbstractLoaderDatasetDataset.__init__()Dataset.download()Dataset.get()Dataset.get_summary()Dataset.index_select()Dataset.indices()Dataset.len()Dataset.print_summary()Dataset.process()Dataset.shuffle()Dataset.to_datapipe()Dataset.has_downloadDataset.has_processDataset.num_classesDataset.num_edge_featuresDataset.num_featuresDataset.num_node_featuresDataset.processed_dirDataset.processed_file_namesDataset.processed_pathsDataset.raw_dirDataset.raw_file_namesDataset.raw_paths
DictConfigPlanetoidPlanetoidDatasetLoader
- topobench.data.loaders.graph.tu_datasets module
AbstractLoaderDatasetDataset.__init__()Dataset.download()Dataset.get()Dataset.get_summary()Dataset.index_select()Dataset.indices()Dataset.len()Dataset.print_summary()Dataset.process()Dataset.shuffle()Dataset.to_datapipe()Dataset.has_downloadDataset.has_processDataset.num_classesDataset.num_edge_featuresDataset.num_featuresDataset.num_node_featuresDataset.processed_dirDataset.processed_file_namesDataset.processed_pathsDataset.raw_dirDataset.raw_file_namesDataset.raw_paths
DictConfigTUDatasetTUDataset.__init__()TUDataset.download()TUDataset.process()TUDataset.cleaned_urlTUDataset.num_edge_attributesTUDataset.num_edge_labelsTUDataset.num_node_attributesTUDataset.num_node_labelsTUDataset.processed_dirTUDataset.processed_file_namesTUDataset.raw_dirTUDataset.raw_file_namesTUDataset.url
TUDatasetLoader
- topobench.data.loaders.graph.us_county_demos_dataset_loader module
AbstractLoaderDictConfigPathPath.cwd()Path.home()Path.absolute()Path.chmod()Path.exists()Path.expanduser()Path.glob()Path.group()Path.hardlink_to()Path.is_block_device()Path.is_char_device()Path.is_dir()Path.is_fifo()Path.is_file()Path.is_mount()Path.is_socket()Path.is_symlink()Path.iterdir()Path.lchmod()Path.link_to()Path.lstat()Path.mkdir()Path.open()Path.owner()Path.read_bytes()Path.read_text()Path.readlink()Path.rename()Path.replace()Path.resolve()Path.rglob()Path.rmdir()Path.samefile()Path.stat()Path.symlink_to()Path.touch()Path.unlink()Path.write_bytes()Path.write_text()
USCountyDemosDatasetUSCountyDemosDataset.__init__()USCountyDemosDataset.download()USCountyDemosDataset.process()USCountyDemosDataset.FILE_FORMATUSCountyDemosDataset.RAW_FILE_NAMESUSCountyDemosDataset.URLSUSCountyDemosDataset.processed_dirUSCountyDemosDataset.processed_file_namesUSCountyDemosDataset.raw_dirUSCountyDemosDataset.raw_file_namesUSCountyDemosDataset.slices
USCountyDemosDatasetLoader