topobench.data.utils package#

Submodules#

topobench.data.utils.io_utils module#

Data IO utilities.

topobench.data.utils.io_utils.download_file_from_drive(file_link, path_to_save, dataset_name, file_format='tar.gz')[source]#

Download a file from a Google Drive link and saves it to the specified path.

Parameters:

file_linkstr: The Google Drive link of the file to download.
path_to_savestr: The path where the downloaded file will be saved.
dataset_namestr: The name of the dataset.
file_formatstr, optional: The format of the downloaded file. Defaults to “tar.gz”.

Raises:

None

topobench.data.utils.io_utils.download_file_from_link(file_link, path_to_save, dataset_name, file_format='tar.gz')[source]#

Download a file from a link and saves it to the specified path.

Parameters:

file_linkstr: The link of the file to download.
path_to_savestr: The path where the downloaded file will be saved.
dataset_namestr: The name of the dataset.
file_formatstr, optional: The format of the downloaded file. Defaults to “tar.gz”.

Raises:

None

topobench.data.utils.io_utils.get_file_id_from_url(url)[source]#

Extract the file ID from a Google Drive file URL.

Parameters:

urlstr: The Google Drive file URL.

Returns:

str: The file ID extracted from the URL.

Raises:

ValueError: If the provided URL is not a valid Google Drive file URL.

topobench.data.utils.io_utils.load_hypergraph_content_dataset(data_dir, data_name)[source]#

Load hypergraph datasets from pickle files.

Parameters:

data_dirstr: Path to data.
data_namestr: Name of the dataset.

Returns:

torch_geometric.data.Data: Hypergraph dataset.

topobench.data.utils.io_utils.load_hypergraph_pickle_dataset(data_dir, data_name)[source]#

Load hypergraph datasets from pickle files.

Parameters:

data_dirstr: Path to data.
data_namestr: Name of the dataset.

Returns:

torch_geometric.data.Data: Hypergraph dataset.

topobench.data.utils.io_utils.read_ndim_manifolds(path, dim, y_val='betti_numbers', slice=None, load_as_graph=False)[source]#

Load MANTRA dataset.

Parameters:

pathstr: Path to the dataset.
dimint: Dimension of the manifolds to load, required to make sanity checks.
y_valstr, optional: The triangulation information to use as label. Can be one of [‘betti_numbers’, ‘torsion_coefficients’, ‘name’, ‘genus’, ‘orientable’] (default: “orientable”).
sliceint, optional: Slice of the dataset to load. If None, load the entire dataset (default: None). Used for testing.
load_as_graphbool: Load mantra dataset as graph. Useful when arbitrary graph lifting need to be used.

Returns:

torch_geometric.data.Data: Data object of the manifold for the MANTRA dataset.

topobench.data.utils.io_utils.read_us_county_demos(path, year=2012, y_col='Election')[source]#

Load US County Demos dataset.

Parameters:

pathstr: Path to the dataset.
yearint, optional: Year to load the features (default: 2012).
y_colstr, optional: Column to use as label. Can be one of [‘Election’, ‘MedianIncome’, ‘MigraRate’, ‘BirthRate’, ‘DeathRate’, ‘BachelorRate’, ‘UnemploymentRate’] (default: “Election”).

Returns:

torch_geometric.data.Data: Data object of the graph for the US County Demos dataset.

topobench.data.utils.split_utils module#

Split utilities.

topobench.data.utils.split_utils.assign_train_val_test_mask_to_graphs(dataset, split_idx)[source]#

Split the graph dataset into train, validation, and test datasets.

Parameters:

datasettorch_geometric.data.Dataset: Considered dataset.
split_idxdict: Dictionary containing the train, validation, and test indices.

Returns:

tuple:: Tuple containing the train, validation, and test datasets.

topobench.data.utils.split_utils.k_fold_split(labels, parameters)[source]#

Return train and valid indices as in K-Fold Cross-Validation.

If the split already exists it loads it automatically, otherwise it creates the split file for the subsequent runs.

Parameters:

labelstorch.Tensor: Label tensor.
parametersDictConfig: Configuration parameters.

Returns:

dict: Dictionary containing the train, validation and test indices, with keys “train”, “valid”, and “test”.

topobench.data.utils.split_utils.load_coauthorship_hypergraph_splits(data, parameters, train_prop=0.5)[source]#

Load the split generated by rand_train_test_idx function.

Parameters:

datatorch_geometric.data.Data: Graph dataset.
parametersDictConfig: Configuration parameters.
train_propfloat: Proportion of training data.

Returns:

torch_geometric.data.Data:: Graph dataset with the specified split.

topobench.data.utils.split_utils.load_inductive_splits(dataset, parameters)[source]#

Load multiple-graph datasets with the specified split.

Parameters:

datasettorch_geometric.data.Dataset: Graph dataset.
parametersDictConfig: Configuration parameters.

Returns:

list:: List containing the train, validation, and test splits.

topobench.data.utils.split_utils.load_transductive_splits(dataset, parameters)[source]#

Load the graph dataset with the specified split.

Parameters:

datasettorch_geometric.data.Dataset: Graph dataset.
parametersDictConfig: Configuration parameters.

Returns:

list:: List containing the train, validation, and test splits.

topobench.data.utils.split_utils.random_splitting(labels, parameters, global_data_seed=42)[source]#

Randomly splits label into train/valid/test splits.

Adapted from CUAI/Non-Homophily-Benchmarks.

Parameters:

labelstorch.Tensor: Label tensor.
parametersDictConfig: Configuration parameter.
global_data_seedint: Seed for the random number generator.

Returns:

dict:: Dictionary containing the train, validation and test indices with keys “train”, “valid”, and “test”.

topobench.data.utils.utils module#

Data utilities.

topobench.data.utils.utils.data2simplicial(data)[source]#

Convert a data dictionary into a SimplicialComplex object.

Parameters:

datadict: A dictionary containing at least ‘incidence_0’, ‘adjacency_0’, ‘incidence_1’, ‘incidence_2’, and optionally ‘incidence_3’ tensors.

Returns:

SimplicialComplex: A SimplicialComplex object constructed from nodes, edges, triangles, and tetrahedrons.

topobench.data.utils.utils.ensure_serializable(obj)[source]#

Ensure that the object is serializable.

Parameters:

objobject: Object to ensure serializability.

Returns:

object: Object that is serializable.

topobench.data.utils.utils.find_tetrahedrons(incidence_1, incidence_2, incidence_3)[source]#

Identify tetrahedrons in the simplicial complex.

Parameters:

incidence_1torch.Tensor: Incidence matrix of edges.
incidence_2torch.Tensor: Incidence matrix of triangles.
incidence_3torch.Tensor: Incidence matrix of tetrahedrons.

Returns:

list of list: List of tetrahedrons, where each is represented as a list of four node indices.

topobench.data.utils.utils.find_triangles(incidence_1, incidence_2)[source]#

Identify triangles in the simplicial complex based on incidence matrices.

Parameters:

incidence_1torch.Tensor: Incidence matrix of edges.
incidence_2torch.Tensor: Incidence matrix of triangles.

Returns:

list of list: List of triangles, where each triangle is a list of three node indices.

topobench.data.utils.utils.generate_zero_sparse_connectivity(m, n)[source]#

Generate a zero sparse connectivity matrix.

Parameters:

mint: Number of rows.
nint: Number of columns.

Returns:

torch.sparse_coo_tensor: Zero sparse connectivity matrix.

topobench.data.utils.utils.get_combinatorial_complex_connectivity(complex, max_rank, neighborhoods=None)[source]#

Get the connectivity matrices for the Combinatorial Complex.

Parameters:

complextopnetx.CombinatorialComplex: Cell complex.
max_rankint: Maximum rank of the complex.
neighborhoodslist, optional: List of neighborhoods of interest.

Returns:

dict: Dictionary containing the connectivity matrices.

topobench.data.utils.utils.get_complex_connectivity(complex, max_rank, neighborhoods=None, signed=False)[source]#

Get the connectivity matrices for the complex.

Parameters:

complextoponetx.CellComplex or toponetx.SimplicialComplex: Cell complex.
max_rankint: Maximum rank of the complex.
neighborhoodslist, optional: List of neighborhoods of interest.
signedbool, optional: If True, returns signed connectivity matrices.

Returns:

dict: Dictionary containing the connectivity matrices.

topobench.data.utils.utils.get_routes_from_neighborhoods(neighborhoods)[source]#

Get the routes from the neighborhoods.

Combination of src_rank, dst_rank. ex: [[0, 0], [1, 0], [1, 1], [1, 1], [2, 1]].

Parameters:

neighborhoodslist: List of neighborhoods of interest.

Returns:

list: List of routes.

topobench.data.utils.utils.load_cell_complex_dataset(cfg)[source]#

Load cell complex datasets.

Parameters:

cfgDictConfig: Configuration parameters.

topobench.data.utils.utils.load_manual_graph()[source]#

Create a manual graph for testing purposes.

Returns:

torch_geometric.data.Data: Manual graph.

topobench.data.utils.utils.load_manual_graph_second_structure()[source]#

Create a manual graph for testing purposes with updated edges and node features.

Returns:

torch_geometric.data.Data: A simple graph data object.

topobench.data.utils.utils.load_manual_hypergraph()[source]#

Create a manual hypergraph for testing purposes.

Returns:

torch_geometric.data.Data: Manual hypergraph.

topobench.data.utils.utils.load_manual_pointcloud(pos_to_x: bool = False)[source]#

Create a manual pointcloud for testing purposes.

Parameters:

pos_to_xbool, optional: If True, the positions are used as features.

Returns:

torch_geometric.data.Data: Manual pointcloud.

topobench.data.utils.utils.load_manual_points()[source]#

Create a manual point cloud for testing purposes.

Returns:

torch_geometric.data.Data: Manual point cloud.

topobench.data.utils.utils.load_manual_simplicial_complex()[source]#

Create a manual simplicial complex for testing purposes.

Returns:

torch_geometric.data.Data: Manual simplicial complex.

topobench.data.utils.utils.load_simplicial_dataset(cfg)[source]#

Load simplicial datasets.

Parameters:

cfgDictConfig: Configuration parameters.

Returns:

torch_geometric.data.Data: Simplicial dataset.

topobench.data.utils.utils.make_hash(o)[source]#

Make a hash from a dictionary, list, tuple or set to any level, that contains only other hashable types.

Parameters:

odict, list, tuple, set: Object to hash.

Returns:

int: Hash of the object.

topobench.data.utils.utils.select_neighborhoods_of_interest(connectivity, neighborhoods)[source]#

Select the neighborhoods of interest.

Parameters:

connectivitydict: Connectivity matrices generated by default.
neighborhoodslist: List of neighborhoods of interest.

Returns:

dict: Connectivity matrices of interest.

Module contents#

Init file for data/utils module.

topobench.data.utils.data2simplicial(data)[source]#

Convert a data dictionary into a SimplicialComplex object.

Parameters:

datadict: A dictionary containing at least ‘incidence_0’, ‘adjacency_0’, ‘incidence_1’, ‘incidence_2’, and optionally ‘incidence_3’ tensors.

Returns:

SimplicialComplex: A SimplicialComplex object constructed from nodes, edges, triangles, and tetrahedrons.

topobench.data.utils.download_file_from_drive(file_link, path_to_save, dataset_name, file_format='tar.gz')[source]#

Download a file from a Google Drive link and saves it to the specified path.

Parameters:

file_linkstr: The Google Drive link of the file to download.
path_to_savestr: The path where the downloaded file will be saved.
dataset_namestr: The name of the dataset.
file_formatstr, optional: The format of the downloaded file. Defaults to “tar.gz”.

Raises:

None

topobench.data.utils.ensure_serializable(obj)[source]#

Ensure that the object is serializable.

Parameters:

objobject: Object to ensure serializability.

Returns:

object: Object that is serializable.

topobench.data.utils.generate_zero_sparse_connectivity(m, n)[source]#

Generate a zero sparse connectivity matrix.

Parameters:

mint: Number of rows.
nint: Number of columns.

Returns:

torch.sparse_coo_tensor: Zero sparse connectivity matrix.

topobench.data.utils.get_combinatorial_complex_connectivity(complex, max_rank, neighborhoods=None)[source]#

Get the connectivity matrices for the Combinatorial Complex.

Parameters:

complextopnetx.CombinatorialComplex: Cell complex.
max_rankint: Maximum rank of the complex.
neighborhoodslist, optional: List of neighborhoods of interest.

Returns:

dict: Dictionary containing the connectivity matrices.

topobench.data.utils.get_complex_connectivity(complex, max_rank, neighborhoods=None, signed=False)[source]#

Get the connectivity matrices for the complex.

Parameters:

complextoponetx.CellComplex or toponetx.SimplicialComplex: Cell complex.
max_rankint: Maximum rank of the complex.
neighborhoodslist, optional: List of neighborhoods of interest.
signedbool, optional: If True, returns signed connectivity matrices.

Returns:

dict: Dictionary containing the connectivity matrices.

topobench.data.utils.get_routes_from_neighborhoods(neighborhoods)[source]#

Get the routes from the neighborhoods.

Combination of src_rank, dst_rank. ex: [[0, 0], [1, 0], [1, 1], [1, 1], [2, 1]].

Parameters:

neighborhoodslist: List of neighborhoods of interest.

Returns:

list: List of routes.

topobench.data.utils.load_cell_complex_dataset(cfg)[source]#

Load cell complex datasets.

Parameters:

cfgDictConfig: Configuration parameters.

topobench.data.utils.load_coauthorship_hypergraph_splits(data, parameters, train_prop=0.5)[source]#

Load the split generated by rand_train_test_idx function.

Parameters:

datatorch_geometric.data.Data: Graph dataset.
parametersDictConfig: Configuration parameters.
train_propfloat: Proportion of training data.

Returns:

torch_geometric.data.Data:: Graph dataset with the specified split.

topobench.data.utils.load_hypergraph_content_dataset(data_dir, data_name)[source]#

Load hypergraph datasets from pickle files.

Parameters:

data_dirstr: Path to data.
data_namestr: Name of the dataset.

Returns:

torch_geometric.data.Data: Hypergraph dataset.

topobench.data.utils.load_hypergraph_pickle_dataset(data_dir, data_name)[source]#

Load hypergraph datasets from pickle files.

Parameters:

data_dirstr: Path to data.
data_namestr: Name of the dataset.

Returns:

torch_geometric.data.Data: Hypergraph dataset.

topobench.data.utils.load_inductive_splits(dataset, parameters)[source]#

Load multiple-graph datasets with the specified split.

Parameters:

datasettorch_geometric.data.Dataset: Graph dataset.
parametersDictConfig: Configuration parameters.

Returns:

list:: List containing the train, validation, and test splits.

topobench.data.utils.load_manual_graph()[source]#

Create a manual graph for testing purposes.

Returns:

torch_geometric.data.Data: Manual graph.

topobench.data.utils.load_simplicial_dataset(cfg)[source]#

Load simplicial datasets.

Parameters:

cfgDictConfig: Configuration parameters.

Returns:

torch_geometric.data.Data: Simplicial dataset.

topobench.data.utils.load_transductive_splits(dataset, parameters)[source]#

Load the graph dataset with the specified split.

Parameters:

datasettorch_geometric.data.Dataset: Graph dataset.
parametersDictConfig: Configuration parameters.

Returns:

list:: List containing the train, validation, and test splits.

topobench.data.utils.make_hash(o)[source]#

Make a hash from a dictionary, list, tuple or set to any level, that contains only other hashable types.

Parameters:

odict, list, tuple, set: Object to hash.

Returns:

int: Hash of the object.

topobench.data.utils.read_us_county_demos(path, year=2012, y_col='Election')[source]#

Load US County Demos dataset.

Parameters:

pathstr: Path to the dataset.
yearint, optional: Year to load the features (default: 2012).
y_colstr, optional: Column to use as label. Can be one of [‘Election’, ‘MedianIncome’, ‘MigraRate’, ‘BirthRate’, ‘DeathRate’, ‘BachelorRate’, ‘UnemploymentRate’] (default: “Election”).

Returns:

torch_geometric.data.Data: Data object of the graph for the US County Demos dataset.

topobench.data.utils.select_neighborhoods_of_interest(connectivity, neighborhoods)[source]#

Select the neighborhoods of interest.

Parameters:

connectivitydict: Connectivity matrices generated by default.
neighborhoodslist: List of neighborhoods of interest.

Returns:

dict: Connectivity matrices of interest.

topobench.data.utils package#

Submodules#

topobench.data.utils.io_utils module#

topobench.data.utils.split_utils module#

topobench.data.utils.utils module#

Module contents#

This Page