topobench.data.loaders.graph.tu_datasets module#
Loaders for TU datasets.
- class AbstractLoader(parameters)#
Bases:
ABCAbstract class that provides an interface to load data.
- Parameters:
- parametersDictConfig
Configuration parameters.
- __init__(parameters)#
- get_data_dir()#
Get the data directory.
- Returns:
- Path
The path to the dataset directory.
- load(**kwargs)#
Load data.
- Parameters:
- **kwargsdict
Additional keyword arguments.
- Returns:
- tuple[torch_geometric.data.Data, str]
Tuple containing the loaded data and the data directory.
- abstract load_dataset()#
Load data into a dataset.
- Returns:
- Union[torch_geometric.data.Dataset, torch.utils.data.Dataset]
The loaded dataset, which could be a PyG or PyTorch dataset.
- Raises:
- NotImplementedError
If the method is not implemented.
- class Dataset(root=None, transform=None, pre_transform=None, pre_filter=None, log=True, force_reload=False)#
Bases:
DatasetDataset base class for creating graph datasets. See here for the accompanying tutorial.
- Parameters:
root (str, optional) – Root directory where the dataset should be saved. (optional:
None)transform (callable, optional) – A function/transform that takes in a
DataorHeteroDataobject and returns a transformed version. The data object will be transformed before every access. (default:None)pre_transform (callable, optional) – A function/transform that takes in a
DataorHeteroDataobject and returns a transformed version. The data object will be transformed before being saved to disk. (default:None)pre_filter (callable, optional) – A function that takes in a
DataorHeteroDataobject and returns a boolean value, indicating whether the data object should be included in the final dataset. (default:None)log (bool, optional) – Whether to print any console output while downloading and processing the dataset. (default:
True)force_reload (bool, optional) – Whether to re-process the dataset. (default:
False)
- __init__(root=None, transform=None, pre_transform=None, pre_filter=None, log=True, force_reload=False)#
- download()#
Downloads the dataset to the
self.raw_dirfolder.
- get(idx)#
Gets the data object at index
idx.
- get_summary()#
Collects summary statistics for the dataset.
- index_select(idx)#
Creates a subset of the dataset from specified indices
idx. Indicesidxcan be a slicing object, e.g.,[2:5], a list, a tuple, or atorch.Tensorornp.ndarrayof type long or bool.
- indices()#
- len()#
Returns the number of data objects stored in the dataset.
- print_summary(fmt='psql')#
Prints summary statistics of the dataset to the console.
- process()#
Processes the dataset to the
self.processed_dirfolder.
- shuffle(return_perm=False)#
Randomly shuffles the examples in the dataset.
- to_datapipe()#
Converts the dataset into a
torch.utils.data.DataPipe.The returned instance can then be used with :pyg:`PyG's` built-in
DataPipesfor batching graphs as follows:from torch_geometric.datasets import QM9 dp = QM9(root='./data/QM9/').to_datapipe() dp = dp.batch_graphs(batch_size=2, drop_last=True) for batch in dp: pass
See the PyTorch tutorial for further background on DataPipes.
- property has_download: bool#
Checks whether the dataset defines a
download()method.
- property num_features: int#
Returns the number of features per node in the dataset. Alias for
num_node_features.
- property processed_file_names: str | List[str] | Tuple[str, ...]#
The name of the files in the
self.processed_dirfolder that must be present in order to skip processing.
- property processed_paths: List[str]#
The absolute filepaths that must be present in order to skip processing.
- class DictConfig(content, key=None, parent=None, ref_type=typing.Any, key_type=typing.Any, element_type=typing.Any, is_optional=True, flags=None)#
Bases:
BaseContainer,MutableMapping[Any,Any]- __init__(content, key=None, parent=None, ref_type=typing.Any, key_type=typing.Any, element_type=typing.Any, is_optional=True, flags=None)#
- copy()#
- get(key, default_value=None)#
Return the value for key if key is in the dictionary, else default_value (defaulting to None).
- items() a set-like object providing a view on D's items#
- items_ex(resolve=True, keys=None)#
- keys() a set-like object providing a view on D's keys#
- pop(k[, d]) v, remove specified key and return the corresponding value.#
If key is not found, d is returned if given, otherwise KeyError is raised.
- setdefault(k[, d]) D.get(k,d), also set D[k]=d if k not in D#
- class TUDataset(root, name, transform=None, pre_transform=None, pre_filter=None, force_reload=False, use_node_attr=False, use_edge_attr=False, cleaned=False)#
Bases:
InMemoryDatasetA variety of graph kernel benchmark datasets, .e.g.,
"IMDB-BINARY","REDDIT-BINARY"or"PROTEINS", collected from the TU Dortmund University. In addition, this dataset wrapper provides cleaned dataset versions as motivated by the “Understanding Isomorphism Bias in Graph Data Sets” paper, containing only non-isomorphic graphs.Note
Some datasets may not come with any node labels. You can then either make use of the argument
use_node_attrto load additional continuous node attributes (if present) or provide synthetic node features using transforms such astorch_geometric.transforms.Constantortorch_geometric.transforms.OneHotDegree.- Parameters:
root (str) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Dataobject and returns a transformed version. The data object will be transformed before every access. (default:None)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Dataobject and returns a transformed version. The data object will be transformed before being saved to disk. (default:None)pre_filter (callable, optional) – A function that takes in an
torch_geometric.data.Dataobject and returns a boolean value, indicating whether the data object should be included in the final dataset. (default:None)force_reload (bool, optional) – Whether to re-process the dataset. (default:
False)use_node_attr (bool, optional) – If
True, the dataset will contain additional continuous node attributes (if present). (default:False)use_edge_attr (bool, optional) – If
True, the dataset will contain additional continuous edge attributes (if present). (default:False)cleaned (bool, optional) – If
True, the dataset will contain only non-isomorphic graphs. (default:False)
STATS:
Name
#graphs
#nodes
#edges
#features
#classes
MUTAG
188
~17.9
~39.6
7
2
ENZYMES
600
~32.6
~124.3
3
6
PROTEINS
1,113
~39.1
~145.6
3
2
COLLAB
5,000
~74.5
~4914.4
0
3
IMDB-BINARY
1,000
~19.8
~193.1
0
2
REDDIT-BINARY
2,000
~429.6
~995.5
0
2
…
- __init__(root, name, transform=None, pre_transform=None, pre_filter=None, force_reload=False, use_node_attr=False, use_edge_attr=False, cleaned=False)#
- download()#
Downloads the dataset to the
self.raw_dirfolder.
- process()#
Processes the dataset to the
self.processed_dirfolder.
- cleaned_url = 'https://raw.githubusercontent.com/nd7141/graph_datasets/master/datasets'#
- property processed_file_names: str#
The name of the files in the
self.processed_dirfolder that must be present in order to skip processing.
- property raw_file_names: List[str]#
The name of the files in the
self.raw_dirfolder that must be present in order to skip downloading.
- url = 'https://www.chrsmrrs.com/graphkerneldatasets'#
- class TUDatasetLoader(parameters)#
Bases:
AbstractLoaderLoad TU datasets.
- Parameters:
- parametersDictConfig
- Configuration parameters containing:
data_dir: Root directory for data
data_name: Name of the dataset
data_type: Type of the dataset (e.g., “graph_classification”)
- __init__(parameters)#
- load_dataset()#
Load TU dataset.
- Returns:
- Dataset
The loaded TU dataset.
- Raises:
- RuntimeError
If dataset loading fails.