topobench.data.datasets.mantra_dataset module#

Dataset class MANTRA dataset.

class Data(x=None, edge_index=None, edge_attr=None, y=None, pos=None, time=None, **kwargs)#

Bases: BaseData, FeatureStore, GraphStore

A data object describing a homogeneous graph. The data object can hold node-level, link-level and graph-level attributes. In general, Data tries to mimic the behavior of a regular :python:`Python` dictionary. In addition, it provides useful functionality for analyzing graph structures, and provides basic PyTorch tensor functionalities. See here for the accompanying tutorial.

from torch_geometric.data import Data

data = Data(x=x, edge_index=edge_index, ...)

# Add additional arguments to `data`:
data.train_idx = torch.tensor([...], dtype=torch.long)
data.test_mask = torch.tensor([...], dtype=torch.bool)

# Analyzing the graph structure:
data.num_nodes
>>> 23

data.is_directed()
>>> False

# PyTorch tensor functionality:
data = data.pin_memory()
data = data.to('cuda:0', non_blocking=True)
Parameters:
  • x (torch.Tensor, optional) – Node feature matrix with shape [num_nodes, num_node_features]. (default: None)

  • edge_index (LongTensor, optional) – Graph connectivity in COO format with shape [2, num_edges]. (default: None)

  • edge_attr (torch.Tensor, optional) – Edge feature matrix with shape [num_edges, num_edge_features]. (default: None)

  • y (torch.Tensor, optional) – Graph-level or node-level ground-truth labels with arbitrary shape. (default: None)

  • pos (torch.Tensor, optional) – Node position matrix with shape [num_nodes, num_dimensions]. (default: None)

  • time (torch.Tensor, optional) – The timestamps for each event with shape [num_edges] or [num_nodes]. (default: None)

  • **kwargs (optional) – Additional attributes.

classmethod from_dict(mapping)#

Creates a Data object from a dictionary.

__init__(x=None, edge_index=None, edge_attr=None, y=None, pos=None, time=None, **kwargs)#
connected_components()#

Extracts connected components of the graph using a union-find algorithm. The components are returned as a list of Data objects, where each object represents a connected component of the graph.

data = Data()
data.x = torch.tensor([[1.0], [2.0], [3.0], [4.0]])
data.y = torch.tensor([[1.1], [2.1], [3.1], [4.1]])
data.edge_index = torch.tensor(
    [[0, 1, 2, 3], [1, 0, 3, 2]], dtype=torch.long
)

components = data.connected_components()
print(len(components))
>>> 2

print(components[0].x)
>>> Data(x=[2, 1], y=[2, 1], edge_index=[2, 2])
Returns:

A list of disconnected components.

Return type:

List[Data]

debug()#
edge_subgraph(subset)#

Returns the induced subgraph given by the edge indices subset. Will currently preserve all the nodes in the graph, even if they are isolated after subgraph computation.

Parameters:

subset (LongTensor or BoolTensor) – The edges to keep.

get_all_edge_attrs()#

Returns all registered edge attributes.

get_all_tensor_attrs()#

Obtains all feature attributes stored in Data.

is_edge_attr(key)#

Returns True if the object at key key denotes an edge-level tensor attribute.

is_node_attr(key)#

Returns True if the object at key key denotes a node-level tensor attribute.

stores_as(data)#
subgraph(subset)#

Returns the induced subgraph given by the node indices subset.

Parameters:

subset (LongTensor or BoolTensor) – The nodes to keep.

to_dict()#

Returns a dictionary of stored key/value pairs.

to_heterogeneous(node_type=None, edge_type=None, node_type_names=None, edge_type_names=None)#

Converts a Data object to a heterogeneous HeteroData object. For this, node and edge attributes are splitted according to the node-level and edge-level vectors node_type and edge_type, respectively. node_type_names and edge_type_names can be used to give meaningful node and edge type names, respectively. That is, the node_type 0 is given by node_type_names[0]. If the Data object was constructed via to_homogeneous(), the object can be reconstructed without any need to pass in additional arguments.

Parameters:
  • node_type (torch.Tensor, optional) – A node-level vector denoting the type of each node. (default: None)

  • edge_type (torch.Tensor, optional) – An edge-level vector denoting the type of each edge. (default: None)

  • node_type_names (List[str], optional) – The names of node types. (default: None)

  • edge_type_names (List[Tuple[str, str, str]], optional) – The names of edge types. (default: None)

to_namedtuple()#

Returns a NamedTuple of stored key/value pairs.

update(data)#

Updates the data object with the elements from another data object. Added elements will override existing ones (in case of duplicates).

validate(raise_on_error=True)#

Validates the correctness of the data.

property batch: Tensor | None#

!! processed by numpydoc !!

property edge_attr: Tensor | None#

!! processed by numpydoc !!

property edge_index: Tensor | None#

!! processed by numpydoc !!

property edge_stores: List[EdgeStorage]#

!! processed by numpydoc !!

property edge_weight: Tensor | None#

!! processed by numpydoc !!

property face: Tensor | None#

!! processed by numpydoc !!

property node_stores: List[NodeStorage]#

!! processed by numpydoc !!

property num_edge_features: int#

Returns the number of features per edge in the graph.

property num_edge_types: int#

Returns the number of edge types in the graph.

property num_faces: int | None#

Returns the number of faces in the mesh.

property num_features: int#

Returns the number of features per node in the graph. Alias for num_node_features.

property num_node_features: int#

Returns the number of features per node in the graph.

property num_node_types: int#

Returns the number of node types in the graph.

property num_nodes: int | None#

Returns the number of nodes in the graph.

Note

The number of nodes in the data object is automatically inferred in case node-level attributes are present, e.g., data.x. In some cases, however, a graph may only be given without any node-level attributes. :pyg:`PyG` then guesses the number of nodes according to edge_index.max().item() + 1. However, in case there exists isolated nodes, this number does not have to be correct which can result in unexpected behavior. Thus, we recommend to set the number of nodes in your data object explicitly via data.num_nodes = .... You will be given a warning that requests you to do so.

property pos: Tensor | None#

!! processed by numpydoc !!

property stores: List[BaseStorage]#

!! processed by numpydoc !!

property time: Tensor | None#

!! processed by numpydoc !!

property x: Tensor | None#

!! processed by numpydoc !!

property y: Tensor | int | float | None#

!! processed by numpydoc !!

class DictConfig(content, key=None, parent=None, ref_type=typing.Any, key_type=typing.Any, element_type=typing.Any, is_optional=True, flags=None)#

Bases: BaseContainer, MutableMapping[Any, Any]

__init__(content, key=None, parent=None, ref_type=typing.Any, key_type=typing.Any, element_type=typing.Any, is_optional=True, flags=None)#
copy()#
get(key, default_value=None)#

Return the value for key if key is in the dictionary, else default_value (defaulting to None).

items() a set-like object providing a view on D's items#
items_ex(resolve=True, keys=None)#
keys() a set-like object providing a view on D's keys#
pop(k[, d]) v, remove specified key and return the corresponding value.#

If key is not found, d is returned if given, otherwise KeyError is raised.

setdefault(k[, d]) D.get(k,d), also set D[k]=d if k not in D#
class InMemoryDataset(root=None, transform=None, pre_transform=None, pre_filter=None, log=True, force_reload=False)#

Bases: Dataset

Dataset base class for creating graph datasets which easily fit into CPU memory. See here for the accompanying tutorial.

Parameters:
  • root (str, optional) – Root directory where the dataset should be saved. (optional: None)

  • transform (callable, optional) – A function/transform that takes in a Data or HeteroData object and returns a transformed version. The data object will be transformed before every access. (default: None)

  • pre_transform (callable, optional) – A function/transform that takes in a Data or HeteroData object and returns a transformed version. The data object will be transformed before being saved to disk. (default: None)

  • pre_filter (callable, optional) – A function that takes in a Data or HeteroData object and returns a boolean value, indicating whether the data object should be included in the final dataset. (default: None)

  • log (bool, optional) – Whether to print any console output while downloading and processing the dataset. (default: True)

  • force_reload (bool, optional) – Whether to re-process the dataset. (default: False)

classmethod save(data_list, path)#

Saves a list of data objects to the file path path.

static collate(data_list)#

Collates a list of Data or HeteroData objects to the internal storage format of InMemoryDataset.

__init__(root=None, transform=None, pre_transform=None, pre_filter=None, log=True, force_reload=False)#
copy(idx=None)#

Performs a deep-copy of the dataset. If idx is not given, will clone the full dataset. Otherwise, will only clone a subset of the dataset from indices idx. Indices can be slices, lists, tuples, and a torch.Tensor or np.ndarray of type long or bool.

cpu(*args)#

Moves the dataset to CPU memory.

cuda(device=None)#

Moves the dataset toto CUDA memory.

get(idx)#

Gets the data object at index idx.

len()#

Returns the number of data objects stored in the dataset.

load(path, data_cls=<class 'torch_geometric.data.data.Data'>)#

Loads the dataset from the file path path.

to(device)#

Performs device conversion of the whole dataset.

to_on_disk_dataset(root=None, backend='sqlite', log=True)#

Converts the InMemoryDataset to a OnDiskDataset variant. Useful for distributed training and hardware instances with limited amount of shared memory.

root (str, optional): Root directory where the dataset should be saved.

If set to None, will save the dataset in root/on_disk. Note that it is important to specify root to account for different dataset splits. (optional: None)

backend (str): The Database backend to use.

(default: "sqlite")

log (bool, optional): Whether to print any console output while

processing the dataset. (default: True)

property data: Any#

!! processed by numpydoc !!

property num_classes: int#

Returns the number of classes in the dataset.

property processed_file_names: str | List[str] | Tuple[str, ...]#

The name of the files in the self.processed_dir folder that must be present in order to skip processing.

property raw_file_names: str | List[str] | Tuple[str, ...]#

The name of the files in the self.raw_dir folder that must be present in order to skip downloading.

class MantraDataset(root, name, parameters, **kwargs)#

Bases: InMemoryDataset

Dataset class for MANTRA manifold dataset.

Parameters:
rootstr

Root directory where the dataset will be saved.

namestr

Name of the dataset.

parametersDictConfig

Configuration parameters for the dataset.

**kwargsdict

Additional keyword arguments.

Attributes:
URLS (dict): Dictionary containing the URLs for downloading the dataset.
FILE_FORMAT (dict): Dictionary containing the file formats for the dataset.
RAW_FILE_NAMES (dict): Dictionary containing the raw file names for the dataset.
__init__(root, name, parameters, **kwargs)#
download()#

Download the dataset from a URL and saves it to the raw directory.

Raises:

FileNotFoundError – If the dataset URL is not found.

process()#

Handle the data for the dataset.

This method loads the JSON file for MANTRA for the specified manifold dimension, applies the respective preprocessing if specified and saves the preprocessed data to the appropriate location.

FILE_FORMAT: ClassVar = {'2_manifolds': 'json.gz', '3_manifolds': 'json.gz'}#
RAW_FILE_NAMES: ClassVar = {}#
URLS: ClassVar = {'2_manifolds': 'https://github.com/aidos-lab/mantra/releases/download/{version}/2_manifolds.json.gz', '3_manifolds': 'https://github.com/aidos-lab/mantra/releases/download/{version}/3_manifolds.json.gz'}#
property processed_dir: str#

Return the path to the processed directory of the dataset.

Returns:
str

Path to the processed directory.

property processed_file_names: str#

Return the processed file name for the dataset.

Returns:
str

Processed file name.

property raw_dir: str#

Return the path to the raw directory of the dataset.

Returns:
str

Path to the raw directory.

property raw_file_names: list[str]#

Return the raw file names for the dataset.

Returns:
list[str]

List of raw file names.

class OmegaConf#

Bases: object

OmegaConf primary class

classmethod clear_resolver(name)#

Clear(remove) any resolver only if it exists.

Returns a bool: True if resolver is removed and False if not removed.

Parameters:

name (str) – Name of the resolver.

Returns:

A bool (True if resolver is removed, False if not found before removing).

Return type:

bool

classmethod has_resolver(name)#
static clear_cache(conf)#
static clear_resolvers()#

Clear(remove) all OmegaConf resolvers, then re-register OmegaConf’s default resolvers.

static copy_cache(from_config, to_config)#
static create(obj=_DEFAULT_MARKER_, parent=None, flags=None)#
static from_cli(args_list=None)#
static from_dotlist(dotlist)#

Creates config from the content sys.argv or from the specified args list of not None

Parameters:

dotlist (List[str]) – A list of dotlist-style strings, e.g. ["foo.bar=1", "baz=qux"].

Returns:

A DictConfig object created from the dotlist.

Return type:

DictConfig

static get_cache(conf)#
static get_type(obj, key=None)#
static is_config(obj)#
static is_dict(obj)#
static is_interpolation(node, key=None)#
static is_list(obj)#
static is_missing(cfg, key)#
static is_readonly(conf)#
static is_struct(conf)#
static legacy_register_resolver(name, resolver)#
static load(file_)#
static masked_copy(conf, keys)#

Create a masked copy of of this config that contains a subset of the keys

Parameters:
Returns:

The masked DictConfig object.

Return type:

DictConfig

static merge(*configs)#

Merge a list of previously created configs into a single one

Parameters:

configs (DictConfig | ListConfig | Dict[str | bytes | int | Enum | float | bool, Any] | List[Any] | Tuple[Any, ...] | Any) – Input configs

Returns:

the merged config object.

Return type:

ListConfig | DictConfig

static missing_keys(cfg)#

Returns a set of missing keys in a dotlist style.

Parameters:

cfg (Any) – An OmegaConf.Container, or a convertible object via OmegaConf.create (dict, list, …).

Returns:

set of strings of the missing keys.

Raises:

ValueError – On input not representing a config.

Return type:

Set[str]

static register_new_resolver(name, resolver, *, replace=False, use_cache=False)#

Register a resolver.

Parameters:
  • name (str) – Name of the resolver.

  • resolver (Callable[[...], Any]) – Callable whose arguments are provided in the interpolation, e.g., with ${foo:x,0,${y.z}} these arguments are respectively “x” (str), 0 (int) and the value of y.z.

  • replace (bool) – If set to False (default), then a ValueError is raised if an existing resolver has already been registered with the same name. If set to True, then the new resolver replaces the previous one. NOTE: The cache on existing config objects is not affected, use OmegaConf.clear_cache(cfg) to clear it.

  • use_cache (bool) – Whether the resolver’s outputs should be cached. The cache is based only on the string literals representing the resolver arguments, e.g., ${foo:${bar}} will always return the same value regardless of the value of bar if the cache is enabled for foo.

static register_resolver(name, resolver)#
static resolve(cfg)#

Resolves all interpolations in the given config object in-place.

Parameters:

cfg (Container) – An OmegaConf container (DictConfig, ListConfig) Raises a ValueError if the input object is not an OmegaConf container.

static save(config, f, resolve=False)#

Save as configuration object to a file

Parameters:
  • config (Any) – omegaconf.Config object (DictConfig or ListConfig).

  • f (str | Path | IO[Any]) – filename or file object

  • resolve (bool) – True to save a resolved config (defaults to False)

static select(cfg, key, *, default=_DEFAULT_MARKER_, throw_on_resolution_failure=True, throw_on_missing=False)#
Parameters:
  • cfg (Container) – Config node to select from

  • key (str) – Key to select

  • default (Any) – Default value to return if key is not found

  • throw_on_resolution_failure (bool) – Raise an exception if an interpolation resolution error occurs, otherwise return None

  • throw_on_missing (bool) – Raise an exception if an attempt to select a missing key (with the value ‘???’) is made, otherwise return None

Returns:

selected value or None if not found.

Return type:

Any

static set_cache(conf, cache)#
static set_readonly(conf, value)#
static set_struct(conf, value)#
static structured(obj, parent=None, flags=None)#
static to_container(cfg, *, resolve=False, throw_on_missing=False, enum_to_str=False, structured_config_mode=SCMode.DICT)#

Resursively converts an OmegaConf config to a primitive container (dict or list).

Parameters:
  • cfg (Any) – the config to convert

  • resolve (bool) – True to resolve all values

  • throw_on_missing (bool) – When True, raise MissingMandatoryValue if any missing values are present. When False (the default), replace missing values with the string “???” in the output container.

  • enum_to_str (bool) – True to convert Enum keys and values to strings

  • structured_config_mode (SCMode) –

    Specify how Structured Configs (DictConfigs backed by a dataclass) are handled.
    • By default (structured_config_mode=SCMode.DICT) structured configs are converted to plain dicts.

    • If structured_config_mode=SCMode.DICT_CONFIG, structured config nodes will remain as DictConfig.

    • If structured_config_mode=SCMode.INSTANTIATE, this function will instantiate structured configs (DictConfigs backed by a dataclass), by creating an instance of the underlying dataclass.

    See also OmegaConf.to_object.

Returns:

A dict or a list representing this config as a primitive container.

Return type:

Dict[str | bytes | int | Enum | float | bool, Any] | List[Any] | None | str | Any

static to_object(cfg)#

Resursively converts an OmegaConf config to a primitive container (dict or list). Any DictConfig objects backed by dataclasses or attrs classes are instantiated as instances of those backing classes.

This is an alias for OmegaConf.to_container(…, resolve=True, throw_on_missing=True,

structured_config_mode=SCMode.INSTANTIATE)

Parameters:

cfg (Any) – the config to convert

Returns:

A dict or a list or dataclass representing this config.

Return type:

Dict[str | bytes | int | Enum | float | bool, Any] | List[Any] | None | str | Any

static to_yaml(cfg, *, resolve=False, sort_keys=False)#

returns a yaml dump of this config object.

Parameters:
  • cfg (Any) – Config object, Structured Config type or instance

  • resolve (bool) – if True, will return a string with the interpolations resolved, otherwise interpolations are preserved

  • sort_keys (bool) – If True, will print dict keys in sorted order. default False.

Returns:

A string containing the yaml representation.

Return type:

str

static unsafe_merge(*configs)#

Merge a list of previously created configs into a single one This is much faster than OmegaConf.merge() as the input configs are not copied. However, the input configs must not be used after this operation as will become inconsistent.

Parameters:

configs (DictConfig | ListConfig | Dict[str | bytes | int | Enum | float | bool, Any] | List[Any] | Tuple[Any, ...] | Any) – Input configs

Returns:

the merged config object.

Return type:

ListConfig | DictConfig

static update(cfg, key, value=None, *, merge=True, force_add=False)#

Updates a dot separated key sequence to a value

Parameters:
  • cfg (Container) – input config to update

  • key (str) – key to update (can be a dot separated path)

  • value (Any) – value to set, if value if a list or a dict it will be merged or set depending on merge_config_values

  • merge (bool) – If value is a dict or a list, True (default) to merge into the destination, False to replace the destination.

  • force_add (bool) – insert the entire path regardless of Struct flag or Structured Config nodes.

__init__()#

Download a file from a link and saves it to the specified path.

Parameters:
file_linkstr

The link of the file to download.

path_to_savestr

The path where the downloaded file will be saved.

dataset_namestr

The name of the dataset.

file_formatstr, optional

The format of the downloaded file. Defaults to “tar.gz”.

Raises:
None
extract_gz(path, folder, log=True)#

Extracts a gz archive to a specific folder.

Parameters:
  • path (str) – The path to the tar archive.

  • folder (str) – The folder.

  • log (bool, optional) – If False, will not print anything to the console. (default: True)

read_ndim_manifolds(path, dim, y_val='betti_numbers', neighborhoods=None, signed=True, slice=None)#

Load MANTRA dataset.

Parameters:
pathstr

Path to the dataset.

dimint

Dimension of the manifolds to load, required to make sanity checks.

y_valstr, optional

The triangulation information to use as label. Can be one of [‘betti_numbers’, ‘torsion_coefficients’, ‘name’, ‘genus’, ‘orientable’] (default: “orientable”).

neighborhoodslist of str, optional

The connectivity to consider when building the simplicial complex (default: None, which means all).

signedbool, optional

Whether to consider signed incidence matrices (default: True).

sliceint, optional

Slice of the dataset to load. If None, load the entire dataset (default: None). Used for testing.

Returns:
torch_geometric.data.Data

Data object of the manifold for the MANTRA dataset.