topobench.data.loaders.graph.planetoid_datasets module#
Loaders for PLANETOID datasets.
- class AbstractLoader(parameters)#
Bases:
ABCAbstract class that provides an interface to load data.
- Parameters:
- parametersDictConfig
Configuration parameters.
- __init__(parameters)#
- get_data_dir()#
Get the data directory.
- Returns:
- Path
The path to the dataset directory.
- load(**kwargs)#
Load data.
- Parameters:
- **kwargsdict
Additional keyword arguments.
- Returns:
- tuple[torch_geometric.data.Data, str]
Tuple containing the loaded data and the data directory.
- abstract load_dataset()#
Load data into a dataset.
- Returns:
- Union[torch_geometric.data.Dataset, torch.utils.data.Dataset]
The loaded dataset, which could be a PyG or PyTorch dataset.
- Raises:
- NotImplementedError
If the method is not implemented.
- class Dataset(root=None, transform=None, pre_transform=None, pre_filter=None, log=True, force_reload=False)#
Bases:
DatasetDataset base class for creating graph datasets. See here for the accompanying tutorial.
- Parameters:
root (str, optional) – Root directory where the dataset should be saved. (optional:
None)transform (callable, optional) – A function/transform that takes in a
DataorHeteroDataobject and returns a transformed version. The data object will be transformed before every access. (default:None)pre_transform (callable, optional) – A function/transform that takes in a
DataorHeteroDataobject and returns a transformed version. The data object will be transformed before being saved to disk. (default:None)pre_filter (callable, optional) – A function that takes in a
DataorHeteroDataobject and returns a boolean value, indicating whether the data object should be included in the final dataset. (default:None)log (bool, optional) – Whether to print any console output while downloading and processing the dataset. (default:
True)force_reload (bool, optional) – Whether to re-process the dataset. (default:
False)
- __init__(root=None, transform=None, pre_transform=None, pre_filter=None, log=True, force_reload=False)#
- download()#
Downloads the dataset to the
self.raw_dirfolder.
- get(idx)#
Gets the data object at index
idx.
- get_summary()#
Collects summary statistics for the dataset.
- index_select(idx)#
Creates a subset of the dataset from specified indices
idx. Indicesidxcan be a slicing object, e.g.,[2:5], a list, a tuple, or atorch.Tensorornp.ndarrayof type long or bool.
- indices()#
- len()#
Returns the number of data objects stored in the dataset.
- print_summary(fmt='psql')#
Prints summary statistics of the dataset to the console.
- process()#
Processes the dataset to the
self.processed_dirfolder.
- shuffle(return_perm=False)#
Randomly shuffles the examples in the dataset.
- to_datapipe()#
Converts the dataset into a
torch.utils.data.DataPipe.The returned instance can then be used with :pyg:`PyG's` built-in
DataPipesfor batching graphs as follows:from torch_geometric.datasets import QM9 dp = QM9(root='./data/QM9/').to_datapipe() dp = dp.batch_graphs(batch_size=2, drop_last=True) for batch in dp: pass
See the PyTorch tutorial for further background on DataPipes.
- property has_download: bool#
Checks whether the dataset defines a
download()method.
- property num_features: int#
Returns the number of features per node in the dataset. Alias for
num_node_features.
- property processed_file_names: str | List[str] | Tuple[str, ...]#
The name of the files in the
self.processed_dirfolder that must be present in order to skip processing.
- property processed_paths: List[str]#
The absolute filepaths that must be present in order to skip processing.
- class DictConfig(content, key=None, parent=None, ref_type=typing.Any, key_type=typing.Any, element_type=typing.Any, is_optional=True, flags=None)#
Bases:
BaseContainer,MutableMapping[Any,Any]- __init__(content, key=None, parent=None, ref_type=typing.Any, key_type=typing.Any, element_type=typing.Any, is_optional=True, flags=None)#
- copy()#
- get(key, default_value=None)#
Return the value for key if key is in the dictionary, else default_value (defaulting to None).
- items() a set-like object providing a view on D's items#
- items_ex(resolve=True, keys=None)#
- keys() a set-like object providing a view on D's keys#
- pop(k[, d]) v, remove specified key and return the corresponding value.#
If key is not found, d is returned if given, otherwise KeyError is raised.
- setdefault(k[, d]) D.get(k,d), also set D[k]=d if k not in D#
- class Planetoid(root, name, split='public', num_train_per_class=20, num_val=500, num_test=1000, transform=None, pre_transform=None, force_reload=False)#
Bases:
InMemoryDatasetThe citation network datasets
"Cora","CiteSeer"and"PubMed"from the “Revisiting Semi-Supervised Learning with Graph Embeddings” paper. Nodes represent documents and edges represent citation links. Training, validation and test splits are given by binary masks.- Parameters:
root (str) – Root directory where the dataset should be saved.
name (str) – The name of the dataset (
"Cora","CiteSeer","PubMed").split (str, optional) –
The type of dataset split (
"public","full","geom-gcn","random"). If set to"public", the split will be the public fixed split from the “Revisiting Semi-Supervised Learning with Graph Embeddings” paper. If set to"full", all nodes except those in the validation and test sets will be used for training (as in the “FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling” paper). If set to"geom-gcn", the 10 public fixed splits from the “Geom-GCN: Geometric Graph Convolutional Networks” paper are given. If set to"random", train, validation, and test sets will be randomly generated, according tonum_train_per_class,num_valandnum_test. (default:"public")num_train_per_class (int, optional) – The number of training samples per class in case of
"random"split. (default:20)num_val (int, optional) – The number of validation samples in case of
"random"split. (default:500)num_test (int, optional) – The number of test samples in case of
"random"split. (default:1000)transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Dataobject and returns a transformed version. The data object will be transformed before every access. (default:None)pre_transform (callable, optional) – A function/transform that takes in an
torch_geometric.data.Dataobject and returns a transformed version. The data object will be transformed before being saved to disk. (default:None)force_reload (bool, optional) – Whether to re-process the dataset. (default:
False)
STATS:
Name
#nodes
#edges
#features
#classes
Cora
2,708
10,556
1,433
7
CiteSeer
3,327
9,104
3,703
6
PubMed
19,717
88,648
500
3
- __init__(root, name, split='public', num_train_per_class=20, num_val=500, num_test=1000, transform=None, pre_transform=None, force_reload=False)#
- download()#
Downloads the dataset to the
self.raw_dirfolder.
- process()#
Processes the dataset to the
self.processed_dirfolder.
- geom_gcn_url = 'https://raw.githubusercontent.com/graphdml-uiuc-jlu/geom-gcn/master'#
- property processed_file_names: str#
The name of the files in the
self.processed_dirfolder that must be present in order to skip processing.
- property raw_file_names: List[str]#
The name of the files in the
self.raw_dirfolder that must be present in order to skip downloading.
- url = 'https://github.com/kimiyoung/planetoid/raw/master/data'#
- class PlanetoidDatasetLoader(parameters)#
Bases:
AbstractLoaderLoad PLANETOID datasets.
- Parameters:
- parametersDictConfig
- Configuration parameters containing:
data_dir: Root directory for data
data_name: Name of the dataset
data_type: Type of the dataset (e.g., “cocitation”)
- __init__(parameters)#
- load_dataset()#
Load Planetoid dataset.
- Returns:
- Dataset
The loaded Planetoid dataset.
- Raises:
- RuntimeError
If dataset loading fails.