topobench.data.loaders.graph.molecule_datasets module#

Loaders for Molecule datasets (ZINC, AQSOL, and QM9).

class AQSOL(root, split='train', transform=None, pre_transform=None, pre_filter=None, force_reload=False)#

Bases: InMemoryDataset

The AQSOL dataset from the Benchmarking Graph Neural Networks paper based on AqSolDB, a standardized database of 9,982 molecular graphs with their aqueous solubility values, collected from 9 different data sources.

The aqueous solubility targets are collected from experimental measurements and standardized to LogS units in AqSolDB. These final values denote the property to regress in the AQSOL dataset. After filtering out few graphs with no bonds/edges, the total number of molecular graphs is 9,833. For each molecular graph, the node features are the types of heavy atoms and the edge features are the types of bonds between them, similar as in the ZINC dataset.

Parameters:

root (str) – Root directory where the dataset should be saved.
split (str) – If "train", loads the training dataset. If "val", loads the validation dataset. If "test", loads the test dataset.
transform (Callable | None) – A function/transform that takes in a torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access.
pre_transform (Callable | None) – A function/transform that takes in a torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk.
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset.
force_reload (bool) – Whether to re-process the dataset.

STATS:

#graphs	#nodes	#edges	#features	#classes
9,833	~17.6	~35.8	1	1

__init__(root, split='train', transform=None, pre_transform=None, pre_filter=None, force_reload=False)#

atoms()#

bonds()#

download()#

Downloads the dataset to the self.raw_dir folder.

process()#

Processes the dataset to the self.processed_dir folder.

property processed_file_names: List[str]#: The name of the files in the self.processed_dir folder that must be present in order to skip processing.

property raw_file_names: List[str]#: The name of the files in the self.raw_dir folder that must be present in order to skip downloading.

url = 'https://www.dropbox.com/s/lzu9lmukwov12kt/aqsol_graph_raw.zip?dl=1'#

class AbstractLoader(parameters)#

Bases: ABC

Abstract class that provides an interface to load data.

Parameters:

parametersDictConfig: Configuration parameters.

__init__(parameters)#

get_data_dir()#

Get the data directory.

Returns:

Path: The path to the dataset directory.

load(**kwargs)#

Load data.

Parameters:

**kwargsdict: Additional keyword arguments.

Returns:

tuple[torch_geometric.data.Data, str]: Tuple containing the loaded data and the data directory.

abstractmethod load_dataset()#

Load data into a dataset.

Returns:

Union[torch_geometric.data.Dataset, torch.utils.data.Dataset]: The loaded dataset, which could be a PyG or PyTorch dataset.

Raises:

NotImplementedError: If the method is not implemented.

class Dataset(root=None, transform=None, pre_transform=None, pre_filter=None, log=True, force_reload=False)#

Bases: Dataset

Dataset base class for creating graph datasets. See here for the accompanying tutorial.

Parameters:

root (str, optional) – Root directory where the dataset should be saved. (optional: None)
transform (callable, optional) – A function/transform that takes in a Data or HeteroData object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in a Data or HeteroData object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in a Data or HeteroData object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)
log (bool, optional) – Whether to print any console output while downloading and processing the dataset. (default: True)
force_reload (bool, optional) – Whether to re-process the dataset. (default: False)

__init__(root=None, transform=None, pre_transform=None, pre_filter=None, log=True, force_reload=False)#

download()#

Downloads the dataset to the self.raw_dir folder.

get(idx)#

Gets the data object at index idx.

get_summary()#

Collects summary statistics for the dataset.

index_select(idx)#

Creates a subset of the dataset from specified indices idx. Indices idx can be a slicing object, e.g., [2:5], a list, a tuple, or a torch.Tensor or np.ndarray of type long or bool.

indices()#

len()#

Returns the number of data objects stored in the dataset.

print_summary(fmt='psql')#

Prints summary statistics of the dataset to the console.

Parameters:: fmt (str, optional) – Summary tables format. Available table formats can be found here. (default: "psql")

process()#

Processes the dataset to the self.processed_dir folder.

shuffle(return_perm=False)#

Randomly shuffles the examples in the dataset.

Parameters:: return_perm (bool, optional) – If set to True, will also return the random permutation used to shuffle the dataset. (default: False)

to_datapipe()#

Converts the dataset into a torch.utils.data.DataPipe.

The returned instance can then be used with :pyg:`PyG's` built-in DataPipes for batching graphs as follows:

from torch_geometric.datasets import QM9

dp = QM9(root='./data/QM9/').to_datapipe()
dp = dp.batch_graphs(batch_size=2, drop_last=True)

for batch in dp:
    pass

See the PyTorch tutorial for further background on DataPipes.

property has_download: bool#: Checks whether the dataset defines a download() method.

property has_process: bool#: Checks whether the dataset defines a process() method.

property num_classes: int#: Returns the number of classes in the dataset.

property num_edge_features: int#: Returns the number of features per edge in the dataset.

property num_features: int#: Returns the number of features per node in the dataset. Alias for num_node_features.

property num_node_features: int#: Returns the number of features per node in the dataset.

property processed_dir: str#: !! processed by numpydoc !!

property processed_file_names: str | List[str] | Tuple[str, ...]#: The name of the files in the self.processed_dir folder that must be present in order to skip processing.

property processed_paths: List[str]#: The absolute filepaths that must be present in order to skip processing.

property raw_dir: str#: !! processed by numpydoc !!

property raw_file_names: str | List[str] | Tuple[str, ...]#: The name of the files in the self.raw_dir folder that must be present in order to skip downloading.

property raw_paths: List[str]#: The absolute filepaths that must be present in order to skip downloading.

class DictConfig(content, key=None, parent=None, ref_type=typing.Any, key_type=typing.Any, element_type=typing.Any, is_optional=True, flags=None)#

Bases: BaseContainer, MutableMapping[Any, Any]

__init__(content, key=None, parent=None, ref_type=typing.Any, key_type=typing.Any, element_type=typing.Any, is_optional=True, flags=None)#

copy()#

get(key, default_value=None)#

Return the value for key if key is in the dictionary, else default_value (defaulting to None).

items() → a set-like object providing a view on D's items#

items_ex(resolve=True, keys=None)#

keys() → a set-like object providing a view on D's keys#

pop(k[, d]) → v, remove specified key and return the corresponding value.#

If key is not found, d is returned if given, otherwise KeyError is raised.

setdefault(k[, d]) → D.get(k,d), also set D[k]=d if k not in D#

class MoleculeDatasetLoader(parameters)#

Bases: AbstractLoader

Load molecule datasets (ZINC and AQSOL) with predefined splits, or QM9.

Parameters:

parametersDictConfig

Configuration parameters containing:

data_dir: Root directory for data
data_name: Name of the dataset
data_type: Type of the dataset (e.g., “molecule”)
qm9_target_index: (QM9 only) Which of the 19 regression targets to use (default 0).

__init__(parameters)#

get_data_dir()#

Get the data directory.

Returns:

Path: The path to the dataset directory.

load_dataset()#

Load the molecule dataset with predefined splits.

Returns:

Dataset: The combined dataset with predefined splits.

Raises:

RuntimeError: If dataset loading fails.

class OmegaConf#

Bases: object

OmegaConf primary class

classmethod clear_resolver(name)#

Clear(remove) any resolver only if it exists.

Returns a bool: True if resolver is removed and False if not removed.

Parameters:: name (str) – Name of the resolver.
Returns:: A bool (True if resolver is removed, False if not found before removing).
Return type:: bool

classmethod has_resolver(name)#

static clear_cache(conf)#

static clear_resolvers()#

Clear(remove) all OmegaConf resolvers, then re-register OmegaConf’s default resolvers.

static copy_cache(from_config, to_config)#

static create(obj=_DEFAULT_MARKER_, parent=None, flags=None)#

static from_cli(args_list=None)#

static from_dotlist(dotlist)#

Creates config from the content sys.argv or from the specified args list of not None

Parameters:: dotlist (List[str]) – A list of dotlist-style strings, e.g. ["foo.bar=1", "baz=qux"].
Returns:: A DictConfig object created from the dotlist.
Return type:: DictConfig

static get_cache(conf)#

static get_type(obj, key=None)#

static is_config(obj)#

static is_dict(obj)#

static is_interpolation(node, key=None)#

static is_list(obj)#

static is_missing(cfg, key)#

static is_readonly(conf)#

static is_struct(conf)#

static legacy_register_resolver(name, resolver)#

static load(file_)#

static masked_copy(conf, keys)#

Create a masked copy of of this config that contains a subset of the keys

Parameters:

conf (DictConfig) – DictConfig object
keys (str | List[str]) – keys to preserve in the copy

Returns:

The masked DictConfig object.

Return type:

DictConfig

static merge(*configs)#

Merge a list of previously created configs into a single one

Parameters:: configs (DictConfig | ListConfig | Dict[str | bytes | int | Enum | float | bool, Any] | List[Any] | Tuple[Any, ...] | Any) – Input configs
Returns:: the merged config object.
Return type:: ListConfig | DictConfig

static missing_keys(cfg)#

Returns a set of missing keys in a dotlist style.

Parameters:: cfg (Any) – An OmegaConf.Container, or a convertible object via OmegaConf.create (dict, list, …).
Returns:: set of strings of the missing keys.
Raises:: ValueError – On input not representing a config.
Return type:: Set[str]

static register_new_resolver(name, resolver, *, replace=False, use_cache=False)#

Parameters:

name (str) – Name of the resolver.
resolver (Callable[[...], Any]) – Callable whose arguments are provided in the interpolation, e.g., with ${foo:x,0,${y.z}} these arguments are respectively “x” (str), 0 (int) and the value of y.z.
replace (bool) – If set to False (default), then a ValueError is raised if an existing resolver has already been registered with the same name. If set to True, then the new resolver replaces the previous one. NOTE: The cache on existing config objects is not affected, use OmegaConf.clear_cache(cfg) to clear it.
use_cache (bool) – Whether the resolver’s outputs should be cached. The cache is based only on the string literals representing the resolver arguments, e.g., ${foo:${bar}} will always return the same value regardless of the value of bar if the cache is enabled for foo.

static register_resolver(name, resolver)#

static resolve(cfg)#

Resolves all interpolations in the given config object in-place.

Parameters:: cfg (Container) – An OmegaConf container (DictConfig, ListConfig) Raises a ValueError if the input object is not an OmegaConf container.

static save(config, f, resolve=False)#

Save as configuration object to a file

Parameters:

config (Any) – omegaconf.Config object (DictConfig or ListConfig).
f (str | Path | IO[Any]) – filename or file object
resolve (bool) – True to save a resolved config (defaults to False)

static select(cfg, key, *, default=_DEFAULT_MARKER_, throw_on_resolution_failure=True, throw_on_missing=False)#

Parameters:

cfg (Container) – Config node to select from
key (str) – Key to select
default (Any) – Default value to return if key is not found
throw_on_resolution_failure (bool) – Raise an exception if an interpolation resolution error occurs, otherwise return None
throw_on_missing (bool) – Raise an exception if an attempt to select a missing key (with the value ‘???’) is made, otherwise return None

Returns:

selected value or None if not found.

Return type:

Any

static set_cache(conf, cache)#

static set_readonly(conf, value)#

static set_struct(conf, value)#

static structured(obj, parent=None, flags=None)#

static to_container(cfg, *, resolve=False, throw_on_missing=False, enum_to_str=False, structured_config_mode=SCMode.DICT)#

Resursively converts an OmegaConf config to a primitive container (dict or list).

Parameters:

cfg (Any) – the config to convert
resolve (bool) – True to resolve all values
throw_on_missing (bool) – When True, raise MissingMandatoryValue if any missing values are present. When False (the default), replace missing values with the string “???” in the output container.
enum_to_str (bool) – True to convert Enum keys and values to strings
structured_config_mode (SCMode) –
Specify how Structured Configs (DictConfigs backed by a dataclass) are handled.
- By default (structured_config_mode=SCMode.DICT) structured configs are converted to plain dicts.
- If structured_config_mode=SCMode.DICT_CONFIG, structured config nodes will remain as DictConfig.
- If structured_config_mode=SCMode.INSTANTIATE, this function will instantiate structured configs (DictConfigs backed by a dataclass), by creating an instance of the underlying dataclass.
See also OmegaConf.to_object.

Returns:

A dict or a list representing this config as a primitive container.

Return type:

static to_object(cfg)#

Resursively converts an OmegaConf config to a primitive container (dict or list). Any DictConfig objects backed by dataclasses or attrs classes are instantiated as instances of those backing classes.

This is an alias for OmegaConf.to_container(…, resolve=True, throw_on_missing=True,: structured_config_mode=SCMode.INSTANTIATE)

Parameters:: cfg (Any) – the config to convert
Returns:: A dict or a list or dataclass representing this config.
Return type:: Dict[str | bytes | int | Enum | float | bool, Any] | List[Any] | None | str | Any

static to_yaml(cfg, *, resolve=False, sort_keys=False)#

returns a yaml dump of this config object.

Parameters:

cfg (Any) – Config object, Structured Config type or instance
resolve (bool) – if True, will return a string with the interpolations resolved, otherwise interpolations are preserved
sort_keys (bool) – If True, will print dict keys in sorted order. default False.

Returns:

A string containing the yaml representation.

Return type:

str

static unsafe_merge(*configs)#

Merge a list of previously created configs into a single one This is much faster than OmegaConf.merge() as the input configs are not copied. However, the input configs must not be used after this operation as will become inconsistent.

Parameters:: configs (DictConfig | ListConfig | Dict[str | bytes | int | Enum | float | bool, Any] | List[Any] | Tuple[Any, ...] | Any) – Input configs
Returns:: the merged config object.
Return type:: ListConfig | DictConfig

static update(cfg, key, value=None, *, merge=True, force_add=False)#

Updates a dot separated key sequence to a value

Parameters:

cfg (Container) – input config to update
key (str) – key to update (can be a dot separated path)
value (Any) – value to set, if value if a list or a dict it will be merged or set depending on merge_config_values
merge (bool) – If value is a dict or a list, True (default) to merge into the destination, False to replace the destination.
force_add (bool) – insert the entire path regardless of Struct flag or Structured Config nodes.

__init__()#

class Path(*args, **kwargs)#

Bases: PurePath

PurePath subclass that can make system calls.

Path represents a filesystem path but unlike PurePath, also offers methods to do system calls on path objects. Depending on your system, instantiating a Path will return either a PosixPath or a WindowsPath object. You can also instantiate a PosixPath or WindowsPath directly, but cannot instantiate a WindowsPath on a POSIX system or vice versa.

classmethod cwd()#: Return a new path pointing to the current working directory (as returned by os.getcwd()).

classmethod home()#: Return a new path pointing to the user’s home directory (as returned by os.path.expanduser(‘~’)).

absolute()#

Return an absolute version of this path by prepending the current working directory. No normalization or symlink resolution is performed.

Use resolve() to get the canonical path to a file.

chmod(mode, *, follow_symlinks=True)#: Change the permissions of the path, like os.chmod().

exists()#: Whether this path exists.

expanduser()#: Return a new path with expanded ~ and ~user constructs (as returned by os.path.expanduser)

glob(pattern)#: Iterate over this subtree and yield all existing files (of any kind, including directories) matching the given relative pattern.

group()#: Return the group name of the file gid.

hardlink_to(target)#

Make this path a hard link pointing to the same file as target.

Note the order of arguments (self, target) is the reverse of os.link’s.

is_block_device()#: Whether this path is a block device.

is_char_device()#: Whether this path is a character device.

is_dir()#: Whether this path is a directory.

is_fifo()#: Whether this path is a FIFO.

is_file()#: Whether this path is a regular file (also True for symlinks pointing to regular files).

is_mount()#: Check if this path is a POSIX mount point

is_socket()#: Whether this path is a socket.

is_symlink()#: Whether this path is a symbolic link.

iterdir()#: Iterate over the files in this directory. Does not yield any result for the special paths ‘.’ and ‘..’.

lchmod(mode)#: Like chmod(), except if the path points to a symlink, the symlink’s permissions are changed, rather than its target’s.

link_to(target)#

Make the target path a hard link pointing to this path.

Note this function does not make this path a hard link to target, despite the implication of the function and argument names. The order of arguments (target, link) is the reverse of Path.symlink_to, but matches that of os.link.

Deprecated since Python 3.10 and scheduled for removal in Python 3.12. Use hardlink_to() instead.

lstat()#: Like stat(), except if the path points to a symlink, the symlink’s status information is returned, rather than its target’s.

mkdir(mode=511, parents=False, exist_ok=False)#: Create a new directory at this given path.

open(mode='r', buffering=-1, encoding=None, errors=None, newline=None)#: Open the file pointed by this path and return a file object, as the built-in open() function does.

owner()#: Return the login name of the file owner.

read_bytes()#: Open the file in bytes mode, read it, and close the file.

read_text(encoding=None, errors=None)#: Open the file in text mode, read it, and close the file.

readlink()#: Return the path to which the symbolic link points.

rename(target)#

Rename this path to the target path.

The target path may be absolute or relative. Relative paths are interpreted relative to the current working directory, not the directory of the Path object.

Returns the new Path instance pointing to the target path.

replace(target)#

Rename this path to the target path, overwriting if that path exists.

The target path may be absolute or relative. Relative paths are interpreted relative to the current working directory, not the directory of the Path object.

Returns the new Path instance pointing to the target path.

resolve(strict=False)#: Make the path absolute, resolving all symlinks on the way and also normalizing it.

rglob(pattern)#: Recursively yield all existing files (of any kind, including directories) matching the given relative pattern, anywhere in this subtree.

rmdir()#: Remove this directory. The directory must be empty.

samefile(other_path)#: Return whether other_path is the same or not as this file (as returned by os.path.samefile()).

stat(*, follow_symlinks=True)#: Return the result of the stat() system call on this path, like os.stat() does.

symlink_to(target, target_is_directory=False)#: Make this path a symlink pointing to the target path. Note the order of arguments (link, target) is the reverse of os.symlink.

touch(mode=438, exist_ok=True)#: Create this file with the given access mode, if it doesn’t exist.

unlink(missing_ok=False)#: Remove this file or link. If the path is a directory, use rmdir() instead.

write_bytes(data)#: Open the file in bytes mode, write to it, and close the file.

write_text(data, encoding=None, errors=None, newline=None)#: Open the file in text mode, write to it, and close the file.

class QM9(root, transform=None, pre_transform=None, pre_filter=None, force_reload=False)#

Bases: InMemoryDataset

The QM9 dataset from the “MoleculeNet: A Benchmark for Molecular Machine Learning” paper, consisting of about 130,000 molecules with 19 regression targets. Each molecule includes complete spatial information for the single low energy conformation of the atoms in the molecule. In addition, we provide the atom features from the “Neural Message Passing for Quantum Chemistry” paper.

Target	Property	Description	Unit
0	$\mu$	Dipole moment	$\textrm{D}$
1	$\alpha$	Isotropic polarizability	${a_0}^3$
2	$\epsilon_{\textrm{HOMO}}$	Highest occupied molecular orbital energy	$\textrm{eV}$
3	$\epsilon_{\textrm{LUMO}}$	Lowest unoccupied molecular orbital energy	$\textrm{eV}$
4	$\Delta \epsilon$	Gap between $\epsilon_{\textrm{HOMO}}$ and $\epsilon_{\textrm{LUMO}}$	$\textrm{eV}$
5	$\langle R^2 \rangle$	Electronic spatial extent	${a_0}^2$
6	$\textrm{ZPVE}$	Zero point vibrational energy	$\textrm{eV}$
7	$U_0$	Internal energy at 0K	$\textrm{eV}$
8	$U$	Internal energy at 298.15K	$\textrm{eV}$
9	$H$	Enthalpy at 298.15K	$\textrm{eV}$
10	$G$	Free energy at 298.15K	$\textrm{eV}$
11	$c_{\textrm{v}}$	Heat capavity at 298.15K	$\frac{\textrm{cal}}{\textrm{mol K}}$
12	$U_0^{\textrm{ATOM}}$	Atomization energy at 0K	$\textrm{eV}$
13	$U^{\textrm{ATOM}}$	Atomization energy at 298.15K	$\textrm{eV}$
14	$H^{\textrm{ATOM}}$	Atomization enthalpy at 298.15K	$\textrm{eV}$
15	$G^{\textrm{ATOM}}$	Atomization free energy at 298.15K	$\textrm{eV}$
16	$A$	Rotational constant	$\textrm{GHz}$
17	$B$	Rotational constant	$\textrm{GHz}$
18	$C$	Rotational constant	$\textrm{GHz}$

Note

We also provide a pre-processed version of the dataset in case rdkit is not installed. The pre-processed version matches with the manually processed version as outlined in process().

Parameters:

root (str) – Root directory where the dataset should be saved.
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)
force_reload (bool, optional) – Whether to re-process the dataset. (default: False)

STATS:

#graphs	#nodes	#edges	#features	#tasks
130,831	~18.0	~37.3	11	19

__init__(root, transform=None, pre_transform=None, pre_filter=None, force_reload=False)#

atomref(target)#

download()#

Downloads the dataset to the self.raw_dir folder.

mean(target)#

process()#

Processes the dataset to the self.processed_dir folder.

std(target)#

property processed_file_names: str#: The name of the files in the self.processed_dir folder that must be present in order to skip processing.

processed_url = 'https://data.pyg.org/datasets/qm9_v3.zip'#

property raw_file_names: List[str]#: The name of the files in the self.raw_dir folder that must be present in order to skip downloading.

raw_url = 'https://deepchemdata.s3-us-west-1.amazonaws.com/datasets/molnet_publish/qm9.zip'#

raw_url2 = 'https://ndownloader.figshare.com/files/3195404'#

class ZINC(root, subset=False, split='train', transform=None, pre_transform=None, pre_filter=None, force_reload=False)#

Bases: InMemoryDataset

The ZINC dataset from the ZINC database and the “Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules” paper, containing about 250,000 molecular graphs with up to 38 heavy atoms. The task is to regress the penalized logP (also called constrained solubility in some works), given by y = logP - SAS - cycles, where logP is the water-octanol partition coefficient, SAS is the synthetic accessibility score, and cycles denotes the number of cycles with more than six atoms. Penalized logP is a score commonly used for training molecular generation models, see, e.g., the “Junction Tree Variational Autoencoder for Molecular Graph Generation” and “Grammar Variational Autoencoder” papers.

Parameters:

root (str) – Root directory where the dataset should be saved.
subset (bool, optional) – If set to True, will only load a subset of the dataset (12,000 molecular graphs), following the “Benchmarking Graph Neural Networks” paper. (default: False)
split (str, optional) – If "train", loads the training dataset. If "val", loads the validation dataset. If "test", loads the test dataset. (default: "train")
transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before every access. (default: None)
pre_transform (callable, optional) – A function/transform that takes in an torch_geometric.data.Data object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)
pre_filter (callable, optional) – A function that takes in an torch_geometric.data.Data object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)
force_reload (bool, optional) – Whether to re-process the dataset. (default: False)

STATS:

Name	#graphs	#nodes	#edges	#features	#classes
ZINC Full	249,456	~23.2	~49.8	1	1
ZINC Subset	12,000	~23.2	~49.8	1	1

__init__(root, subset=False, split='train', transform=None, pre_transform=None, pre_filter=None, force_reload=False)#

download()#

Downloads the dataset to the self.raw_dir folder.

process()#

Processes the dataset to the self.processed_dir folder.

property processed_dir: str#: !! processed by numpydoc !!

property processed_file_names: List[str]#: The name of the files in the self.processed_dir folder that must be present in order to skip processing.

property raw_file_names: List[str]#: The name of the files in the self.raw_dir folder that must be present in order to skip downloading.

split_url = 'https://raw.githubusercontent.com/graphdeeplearning/benchmarking-gnns/master/data/molecules/{}.index'#

url = 'https://www.dropbox.com/s/feo9qle74kg48gy/molecules.zip?dl=1'#

Target	Property	Description	Unit
0	\(\mu\)	Dipole moment	\(\textrm{D}\)
1	\(\alpha\)	Isotropic polarizability	\({a_0}^3\)
2	\(\epsilon_{\textrm{HOMO}}\)	Highest occupied molecular orbital energy	\(\textrm{eV}\)
3	\(\epsilon_{\textrm{LUMO}}\)	Lowest unoccupied molecular orbital energy	\(\textrm{eV}\)
4	\(\Delta \epsilon\)	Gap between \(\epsilon_{\textrm{HOMO}}\) and \(\epsilon_{\textrm{LUMO}}\)	\(\textrm{eV}\)
5	\(\langle R^2 \rangle\)	Electronic spatial extent	\({a_0}^2\)
6	\(\textrm{ZPVE}\)	Zero point vibrational energy	\(\textrm{eV}\)
7	\(U_0\)	Internal energy at 0K	\(\textrm{eV}\)
8	\(U\)	Internal energy at 298.15K	\(\textrm{eV}\)
9	\(H\)	Enthalpy at 298.15K	\(\textrm{eV}\)
10	\(G\)	Free energy at 298.15K	\(\textrm{eV}\)
11	\(c_{\textrm{v}}\)	Heat capavity at 298.15K	\(\frac{\textrm{cal}}{\textrm{mol K}}\)
12	\(U_0^{\textrm{ATOM}}\)	Atomization energy at 0K	\(\textrm{eV}\)
13	\(U^{\textrm{ATOM}}\)	Atomization energy at 298.15K	\(\textrm{eV}\)
14	\(H^{\textrm{ATOM}}\)	Atomization enthalpy at 298.15K	\(\textrm{eV}\)
15	\(G^{\textrm{ATOM}}\)	Atomization free energy at 298.15K	\(\textrm{eV}\)
16	\(A\)	Rotational constant	\(\textrm{GHz}\)
17	\(B\)	Rotational constant	\(\textrm{GHz}\)
18	\(C\)	Rotational constant	\(\textrm{GHz}\)