topobench.data.loaders.graph.ogbg_datasets module#
Loaders for Graph Property Prediction datasets.
- class AbstractLoader(parameters)#
Bases:
ABCAbstract class that provides an interface to load data.
- Parameters:
- parametersDictConfig
Configuration parameters.
- __init__(parameters)#
- get_data_dir()#
Get the data directory.
- Returns:
- Path
The path to the dataset directory.
- load(**kwargs)#
Load data.
- Parameters:
- **kwargsdict
Additional keyword arguments.
- Returns:
- tuple[torch_geometric.data.Data, str]
Tuple containing the loaded data and the data directory.
- abstract load_dataset()#
Load data into a dataset.
- Returns:
- Union[torch_geometric.data.Dataset, torch.utils.data.Dataset]
The loaded dataset, which could be a PyG or PyTorch dataset.
- Raises:
- NotImplementedError
If the method is not implemented.
- class Dataset(root=None, transform=None, pre_transform=None, pre_filter=None, log=True, force_reload=False)#
Bases:
DatasetDataset base class for creating graph datasets. See here for the accompanying tutorial.
- Parameters:
root (str, optional) – Root directory where the dataset should be saved. (optional:
None)transform (callable, optional) – A function/transform that takes in a
DataorHeteroDataobject and returns a transformed version. The data object will be transformed before every access. (default:None)pre_transform (callable, optional) – A function/transform that takes in a
DataorHeteroDataobject and returns a transformed version. The data object will be transformed before being saved to disk. (default:None)pre_filter (callable, optional) – A function that takes in a
DataorHeteroDataobject and returns a boolean value, indicating whether the data object should be included in the final dataset. (default:None)log (bool, optional) – Whether to print any console output while downloading and processing the dataset. (default:
True)force_reload (bool, optional) – Whether to re-process the dataset. (default:
False)
- __init__(root=None, transform=None, pre_transform=None, pre_filter=None, log=True, force_reload=False)#
- download()#
Downloads the dataset to the
self.raw_dirfolder.
- get(idx)#
Gets the data object at index
idx.
- get_summary()#
Collects summary statistics for the dataset.
- index_select(idx)#
Creates a subset of the dataset from specified indices
idx. Indicesidxcan be a slicing object, e.g.,[2:5], a list, a tuple, or atorch.Tensorornp.ndarrayof type long or bool.
- indices()#
- len()#
Returns the number of data objects stored in the dataset.
- print_summary(fmt='psql')#
Prints summary statistics of the dataset to the console.
- process()#
Processes the dataset to the
self.processed_dirfolder.
- shuffle(return_perm=False)#
Randomly shuffles the examples in the dataset.
- to_datapipe()#
Converts the dataset into a
torch.utils.data.DataPipe.The returned instance can then be used with :pyg:`PyG's` built-in
DataPipesfor batching graphs as follows:from torch_geometric.datasets import QM9 dp = QM9(root='./data/QM9/').to_datapipe() dp = dp.batch_graphs(batch_size=2, drop_last=True) for batch in dp: pass
See the PyTorch tutorial for further background on DataPipes.
- property has_download: bool#
Checks whether the dataset defines a
download()method.
- property num_features: int#
Returns the number of features per node in the dataset. Alias for
num_node_features.
- property processed_file_names: str | List[str] | Tuple[str, ...]#
The name of the files in the
self.processed_dirfolder that must be present in order to skip processing.
- property processed_paths: List[str]#
The absolute filepaths that must be present in order to skip processing.
- class DictConfig(content, key=None, parent=None, ref_type=typing.Any, key_type=typing.Any, element_type=typing.Any, is_optional=True, flags=None)#
Bases:
BaseContainer,MutableMapping[Any,Any]- __init__(content, key=None, parent=None, ref_type=typing.Any, key_type=typing.Any, element_type=typing.Any, is_optional=True, flags=None)#
- copy()#
- get(key, default_value=None)#
Return the value for key if key is in the dictionary, else default_value (defaulting to None).
- items() a set-like object providing a view on D's items#
- items_ex(resolve=True, keys=None)#
- keys() a set-like object providing a view on D's keys#
- pop(k[, d]) v, remove specified key and return the corresponding value.#
If key is not found, d is returned if given, otherwise KeyError is raised.
- setdefault(k[, d]) D.get(k,d), also set D[k]=d if k not in D#
- class OGBGDatasetLoader(parameters)#
Bases:
AbstractLoaderLoad molecule datasets (molhiv, molpcba, ppa) with predefined splits.
- Parameters:
- parametersDictConfig
- Configuration parameters containing:
data_dir: Root directory for data
data_name: Name of the dataset
data_type: Type of the dataset (e.g., “molecule”)
- __init__(parameters)#
- get_data_dir()#
Get the data directory.
- Returns:
- Path
The path to the dataset directory.
- load_dataset()#
Load the molecule dataset with predefined splits.
- Returns:
- Dataset
The combined dataset with predefined splits.
- Raises:
- RuntimeError
If dataset loading fails.
- class Path(*args, **kwargs)#
Bases:
PurePathPurePath subclass that can make system calls.
Path represents a filesystem path but unlike PurePath, also offers methods to do system calls on path objects. Depending on your system, instantiating a Path will return either a PosixPath or a WindowsPath object. You can also instantiate a PosixPath or WindowsPath directly, but cannot instantiate a WindowsPath on a POSIX system or vice versa.
- absolute()#
Return an absolute version of this path by prepending the current working directory. No normalization or symlink resolution is performed.
Use resolve() to get the canonical path to a file.
- chmod(mode, *, follow_symlinks=True)#
Change the permissions of the path, like os.chmod().
- classmethod cwd()#
Return a new path pointing to the current working directory (as returned by os.getcwd()).
- exists()#
Whether this path exists.
- expanduser()#
Return a new path with expanded ~ and ~user constructs (as returned by os.path.expanduser)
- glob(pattern)#
Iterate over this subtree and yield all existing files (of any kind, including directories) matching the given relative pattern.
- group()#
Return the group name of the file gid.
- hardlink_to(target)#
Make this path a hard link pointing to the same file as target.
Note the order of arguments (self, target) is the reverse of os.link’s.
- classmethod home()#
Return a new path pointing to the user’s home directory (as returned by os.path.expanduser(‘~’)).
- is_block_device()#
Whether this path is a block device.
- is_char_device()#
Whether this path is a character device.
- is_dir()#
Whether this path is a directory.
- is_fifo()#
Whether this path is a FIFO.
- is_file()#
Whether this path is a regular file (also True for symlinks pointing to regular files).
- is_mount()#
Check if this path is a POSIX mount point
- is_socket()#
Whether this path is a socket.
- is_symlink()#
Whether this path is a symbolic link.
- iterdir()#
Iterate over the files in this directory. Does not yield any result for the special paths ‘.’ and ‘..’.
- lchmod(mode)#
Like chmod(), except if the path points to a symlink, the symlink’s permissions are changed, rather than its target’s.
- link_to(target)#
Make the target path a hard link pointing to this path.
Note this function does not make this path a hard link to target, despite the implication of the function and argument names. The order of arguments (target, link) is the reverse of Path.symlink_to, but matches that of os.link.
Deprecated since Python 3.10 and scheduled for removal in Python 3.12. Use hardlink_to() instead.
- lstat()#
Like stat(), except if the path points to a symlink, the symlink’s status information is returned, rather than its target’s.
- mkdir(mode=511, parents=False, exist_ok=False)#
Create a new directory at this given path.
- open(mode='r', buffering=-1, encoding=None, errors=None, newline=None)#
Open the file pointed by this path and return a file object, as the built-in open() function does.
- owner()#
Return the login name of the file owner.
- read_bytes()#
Open the file in bytes mode, read it, and close the file.
- read_text(encoding=None, errors=None)#
Open the file in text mode, read it, and close the file.
- readlink()#
Return the path to which the symbolic link points.
- rename(target)#
Rename this path to the target path.
The target path may be absolute or relative. Relative paths are interpreted relative to the current working directory, not the directory of the Path object.
Returns the new Path instance pointing to the target path.
- replace(target)#
Rename this path to the target path, overwriting if that path exists.
The target path may be absolute or relative. Relative paths are interpreted relative to the current working directory, not the directory of the Path object.
Returns the new Path instance pointing to the target path.
- resolve(strict=False)#
Make the path absolute, resolving all symlinks on the way and also normalizing it.
- rglob(pattern)#
Recursively yield all existing files (of any kind, including directories) matching the given relative pattern, anywhere in this subtree.
- rmdir()#
Remove this directory. The directory must be empty.
- samefile(other_path)#
Return whether other_path is the same or not as this file (as returned by os.path.samefile()).
- stat(*, follow_symlinks=True)#
Return the result of the stat() system call on this path, like os.stat() does.
- symlink_to(target, target_is_directory=False)#
Make this path a symlink pointing to the target path. Note the order of arguments (link, target) is the reverse of os.symlink.
- touch(mode=438, exist_ok=True)#
Create this file with the given access mode, if it doesn’t exist.
- unlink(missing_ok=False)#
Remove this file or link. If the path is a directory, use rmdir() instead.
- write_bytes(data)#
Open the file in bytes mode, write to it, and close the file.
- write_text(data, encoding=None, errors=None, newline=None)#
Open the file in text mode, write to it, and close the file.
- class PygGraphPropPredDataset(name, root='dataset', transform=None, pre_transform=None, meta_dict=None)#
Bases:
InMemoryDataset- __init__(name, root='dataset', transform=None, pre_transform=None, meta_dict=None)#
name (str): name of the dataset
root (str): root directory to store the dataset folder
transform, pre_transform (optional): transform/pre-transform graph objects
- meta_dict: dictionary that stores all the meta-information about data. Default is None,
but when something is passed, it uses its information. Useful for debugging for external contributers.
- download()#
Downloads the dataset to the
self.raw_dirfolder.
- get_idx_split(split_type=None)#
- process()#
Processes the dataset to the
self.processed_dirfolder.
- property num_classes#
Returns the number of classes in the dataset.
- property processed_file_names#
The name of the files in the
self.processed_dirfolder that must be present in order to skip processing.
- property raw_file_names#
The name of the files in the
self.raw_dirfolder that must be present in order to skip downloading.