topobench.data.utils package#
Submodules#
topobench.data.utils.io_utils module#
Data IO utilities.
- topobench.data.utils.io_utils.download_file_from_drive(file_link, path_to_save, dataset_name, file_format='tar.gz')[source]#
Download a file from a Google Drive link and saves it to the specified path.
- Parameters:
- file_linkstr
The Google Drive link of the file to download.
- path_to_savestr
The path where the downloaded file will be saved.
- dataset_namestr
The name of the dataset.
- file_formatstr, optional
The format of the downloaded file. Defaults to “tar.gz”.
- Raises:
- None
- topobench.data.utils.io_utils.download_file_from_link(file_link, path_to_save, dataset_name, file_format='tar.gz')[source]#
Download a file from a link and saves it to the specified path.
- Parameters:
- file_linkstr
The link of the file to download.
- path_to_savestr
The path where the downloaded file will be saved.
- dataset_namestr
The name of the dataset.
- file_formatstr, optional
The format of the downloaded file. Defaults to “tar.gz”.
- Raises:
- None
- topobench.data.utils.io_utils.get_file_id_from_url(url)[source]#
Extract the file ID from a Google Drive file URL.
- Parameters:
- urlstr
The Google Drive file URL.
- Returns:
- str
The file ID extracted from the URL.
- Raises:
- ValueError
If the provided URL is not a valid Google Drive file URL.
- topobench.data.utils.io_utils.load_hypergraph_pickle_dataset(data_dir, data_name)[source]#
Load hypergraph datasets from pickle files.
- Parameters:
- data_dirstr
Path to data.
- data_namestr
Name of the dataset.
- Returns:
- torch_geometric.data.Data
Hypergraph dataset.
- topobench.data.utils.io_utils.read_ndim_manifolds(path, dim, y_val='betti_numbers', slice=None, load_as_graph=False)[source]#
Load MANTRA dataset.
- Parameters:
- pathstr
Path to the dataset.
- dimint
Dimension of the manifolds to load, required to make sanity checks.
- y_valstr, optional
The triangulation information to use as label. Can be one of [‘betti_numbers’, ‘torsion_coefficients’, ‘name’, ‘genus’, ‘orientable’] (default: “orientable”).
- sliceint, optional
Slice of the dataset to load. If None, load the entire dataset (default: None). Used for testing.
- load_as_graphbool
Load mantra dataset as graph. Useful when arbitrary graph lifting need to be used.
- Returns:
- torch_geometric.data.Data
Data object of the manifold for the MANTRA dataset.
- topobench.data.utils.io_utils.read_us_county_demos(path, year=2012, y_col='Election')[source]#
Load US County Demos dataset.
- Parameters:
- pathstr
Path to the dataset.
- yearint, optional
Year to load the features (default: 2012).
- y_colstr, optional
Column to use as label. Can be one of [‘Election’, ‘MedianIncome’, ‘MigraRate’, ‘BirthRate’, ‘DeathRate’, ‘BachelorRate’, ‘UnemploymentRate’] (default: “Election”).
- Returns:
- torch_geometric.data.Data
Data object of the graph for the US County Demos dataset.
topobench.data.utils.split_utils module#
Split utilities.
- topobench.data.utils.split_utils.assign_train_val_test_mask_to_graphs(dataset, split_idx)[source]#
Split the graph dataset into train, validation, and test datasets.
- Parameters:
- datasettorch_geometric.data.Dataset
Considered dataset.
- split_idxdict
Dictionary containing the train, validation, and test indices.
- Returns:
- tuple:
Tuple containing the train, validation, and test datasets.
- topobench.data.utils.split_utils.k_fold_split(labels, parameters)[source]#
Return train and valid indices as in K-Fold Cross-Validation.
If the split already exists it loads it automatically, otherwise it creates the split file for the subsequent runs.
- Parameters:
- labelstorch.Tensor
Label tensor.
- parametersDictConfig
Configuration parameters.
- Returns:
- dict
Dictionary containing the train, validation and test indices, with keys “train”, “valid”, and “test”.
- topobench.data.utils.split_utils.load_coauthorship_hypergraph_splits(data, parameters, train_prop=0.5)[source]#
Load the split generated by rand_train_test_idx function.
- Parameters:
- datatorch_geometric.data.Data
Graph dataset.
- parametersDictConfig
Configuration parameters.
- train_propfloat
Proportion of training data.
- Returns:
- torch_geometric.data.Data:
Graph dataset with the specified split.
- topobench.data.utils.split_utils.load_inductive_splits(dataset, parameters)[source]#
Load multiple-graph datasets with the specified split.
- Parameters:
- datasettorch_geometric.data.Dataset
Graph dataset.
- parametersDictConfig
Configuration parameters.
- Returns:
- list:
List containing the train, validation, and test splits.
- topobench.data.utils.split_utils.load_transductive_splits(dataset, parameters)[source]#
Load the graph dataset with the specified split.
- Parameters:
- datasettorch_geometric.data.Dataset
Graph dataset.
- parametersDictConfig
Configuration parameters.
- Returns:
- list:
List containing the train, validation, and test splits.
- topobench.data.utils.split_utils.random_splitting(labels, parameters, global_data_seed=42)[source]#
Randomly splits label into train/valid/test splits.
Adapted from CUAI/Non-Homophily-Benchmarks.
- Parameters:
- labelstorch.Tensor
Label tensor.
- parametersDictConfig
Configuration parameter.
- global_data_seedint
Seed for the random number generator.
- Returns:
- dict:
Dictionary containing the train, validation and test indices with keys “train”, “valid”, and “test”.
topobench.data.utils.utils module#
Data utilities.
- topobench.data.utils.utils.data2simplicial(data)[source]#
Convert a data dictionary into a SimplicialComplex object.
- Parameters:
- datadict
A dictionary containing at least ‘incidence_0’, ‘adjacency_0’, ‘incidence_1’, ‘incidence_2’, and optionally ‘incidence_3’ tensors.
- Returns:
- SimplicialComplex
A SimplicialComplex object constructed from nodes, edges, triangles, and tetrahedrons.
- topobench.data.utils.utils.ensure_serializable(obj)[source]#
Ensure that the object is serializable.
- Parameters:
- objobject
Object to ensure serializability.
- Returns:
- object
Object that is serializable.
- topobench.data.utils.utils.find_tetrahedrons(incidence_1, incidence_2, incidence_3)[source]#
Identify tetrahedrons in the simplicial complex.
- Parameters:
- incidence_1torch.Tensor
Incidence matrix of edges.
- incidence_2torch.Tensor
Incidence matrix of triangles.
- incidence_3torch.Tensor
Incidence matrix of tetrahedrons.
- Returns:
- list of list
List of tetrahedrons, where each is represented as a list of four node indices.
- topobench.data.utils.utils.find_triangles(incidence_1, incidence_2)[source]#
Identify triangles in the simplicial complex based on incidence matrices.
- Parameters:
- incidence_1torch.Tensor
Incidence matrix of edges.
- incidence_2torch.Tensor
Incidence matrix of triangles.
- Returns:
- list of list
List of triangles, where each triangle is a list of three node indices.
- topobench.data.utils.utils.generate_zero_sparse_connectivity(m, n)[source]#
Generate a zero sparse connectivity matrix.
- Parameters:
- mint
Number of rows.
- nint
Number of columns.
- Returns:
- torch.sparse_coo_tensor
Zero sparse connectivity matrix.
- topobench.data.utils.utils.get_combinatorial_complex_connectivity(complex, max_rank, neighborhoods=None)[source]#
Get the connectivity matrices for the Combinatorial Complex.
- Parameters:
- complextopnetx.CombinatorialComplex
Cell complex.
- max_rankint
Maximum rank of the complex.
- neighborhoodslist, optional
List of neighborhoods of interest.
- Returns:
- dict
Dictionary containing the connectivity matrices.
- topobench.data.utils.utils.get_complex_connectivity(complex, max_rank, neighborhoods=None, signed=False)[source]#
Get the connectivity matrices for the complex.
- Parameters:
- complextoponetx.CellComplex or toponetx.SimplicialComplex
Cell complex.
- max_rankint
Maximum rank of the complex.
- neighborhoodslist, optional
List of neighborhoods of interest.
- signedbool, optional
If True, returns signed connectivity matrices.
- Returns:
- dict
Dictionary containing the connectivity matrices.
- topobench.data.utils.utils.get_routes_from_neighborhoods(neighborhoods)[source]#
Get the routes from the neighborhoods.
Combination of src_rank, dst_rank. ex: [[0, 0], [1, 0], [1, 1], [1, 1], [2, 1]].
- Parameters:
- neighborhoodslist
List of neighborhoods of interest.
- Returns:
- list
List of routes.
- topobench.data.utils.utils.load_cell_complex_dataset(cfg)[source]#
Load cell complex datasets.
- Parameters:
- cfgDictConfig
Configuration parameters.
- topobench.data.utils.utils.load_manual_graph()[source]#
Create a manual graph for testing purposes.
- Returns:
- torch_geometric.data.Data
Manual graph.
- topobench.data.utils.utils.load_manual_graph_second_structure()[source]#
Create a manual graph for testing purposes with updated edges and node features.
- Returns:
- torch_geometric.data.Data
A simple graph data object.
- topobench.data.utils.utils.load_manual_hypergraph()[source]#
Create a manual hypergraph for testing purposes.
- Returns:
- torch_geometric.data.Data
Manual hypergraph.
- topobench.data.utils.utils.load_manual_pointcloud(pos_to_x: bool = False)[source]#
Create a manual pointcloud for testing purposes.
- Parameters:
- pos_to_xbool, optional
If True, the positions are used as features.
- Returns:
- torch_geometric.data.Data
Manual pointcloud.
- topobench.data.utils.utils.load_manual_points()[source]#
Create a manual point cloud for testing purposes.
- Returns:
- torch_geometric.data.Data
Manual point cloud.
- topobench.data.utils.utils.load_manual_simplicial_complex()[source]#
Create a manual simplicial complex for testing purposes.
- Returns:
- torch_geometric.data.Data
Manual simplicial complex.
- topobench.data.utils.utils.load_simplicial_dataset(cfg)[source]#
Load simplicial datasets.
- Parameters:
- cfgDictConfig
Configuration parameters.
- Returns:
- torch_geometric.data.Data
Simplicial dataset.
- topobench.data.utils.utils.make_hash(o)[source]#
Make a hash from a dictionary, list, tuple or set to any level, that contains only other hashable types.
- Parameters:
- odict, list, tuple, set
Object to hash.
- Returns:
- int
Hash of the object.
- topobench.data.utils.utils.select_neighborhoods_of_interest(connectivity, neighborhoods)[source]#
Select the neighborhoods of interest.
- Parameters:
- connectivitydict
Connectivity matrices generated by default.
- neighborhoodslist
List of neighborhoods of interest.
- Returns:
- dict
Connectivity matrices of interest.
Module contents#
Init file for data/utils module.
- topobench.data.utils.data2simplicial(data)[source]#
Convert a data dictionary into a SimplicialComplex object.
- Parameters:
- datadict
A dictionary containing at least ‘incidence_0’, ‘adjacency_0’, ‘incidence_1’, ‘incidence_2’, and optionally ‘incidence_3’ tensors.
- Returns:
- SimplicialComplex
A SimplicialComplex object constructed from nodes, edges, triangles, and tetrahedrons.
- topobench.data.utils.download_file_from_drive(file_link, path_to_save, dataset_name, file_format='tar.gz')[source]#
Download a file from a Google Drive link and saves it to the specified path.
- Parameters:
- file_linkstr
The Google Drive link of the file to download.
- path_to_savestr
The path where the downloaded file will be saved.
- dataset_namestr
The name of the dataset.
- file_formatstr, optional
The format of the downloaded file. Defaults to “tar.gz”.
- Raises:
- None
- topobench.data.utils.ensure_serializable(obj)[source]#
Ensure that the object is serializable.
- Parameters:
- objobject
Object to ensure serializability.
- Returns:
- object
Object that is serializable.
- topobench.data.utils.generate_zero_sparse_connectivity(m, n)[source]#
Generate a zero sparse connectivity matrix.
- Parameters:
- mint
Number of rows.
- nint
Number of columns.
- Returns:
- torch.sparse_coo_tensor
Zero sparse connectivity matrix.
- topobench.data.utils.get_combinatorial_complex_connectivity(complex, max_rank, neighborhoods=None)[source]#
Get the connectivity matrices for the Combinatorial Complex.
- Parameters:
- complextopnetx.CombinatorialComplex
Cell complex.
- max_rankint
Maximum rank of the complex.
- neighborhoodslist, optional
List of neighborhoods of interest.
- Returns:
- dict
Dictionary containing the connectivity matrices.
- topobench.data.utils.get_complex_connectivity(complex, max_rank, neighborhoods=None, signed=False)[source]#
Get the connectivity matrices for the complex.
- Parameters:
- complextoponetx.CellComplex or toponetx.SimplicialComplex
Cell complex.
- max_rankint
Maximum rank of the complex.
- neighborhoodslist, optional
List of neighborhoods of interest.
- signedbool, optional
If True, returns signed connectivity matrices.
- Returns:
- dict
Dictionary containing the connectivity matrices.
- topobench.data.utils.get_routes_from_neighborhoods(neighborhoods)[source]#
Get the routes from the neighborhoods.
Combination of src_rank, dst_rank. ex: [[0, 0], [1, 0], [1, 1], [1, 1], [2, 1]].
- Parameters:
- neighborhoodslist
List of neighborhoods of interest.
- Returns:
- list
List of routes.
- topobench.data.utils.load_cell_complex_dataset(cfg)[source]#
Load cell complex datasets.
- Parameters:
- cfgDictConfig
Configuration parameters.
- topobench.data.utils.load_coauthorship_hypergraph_splits(data, parameters, train_prop=0.5)[source]#
Load the split generated by rand_train_test_idx function.
- Parameters:
- datatorch_geometric.data.Data
Graph dataset.
- parametersDictConfig
Configuration parameters.
- train_propfloat
Proportion of training data.
- Returns:
- torch_geometric.data.Data:
Graph dataset with the specified split.
- topobench.data.utils.load_hypergraph_pickle_dataset(data_dir, data_name)[source]#
Load hypergraph datasets from pickle files.
- Parameters:
- data_dirstr
Path to data.
- data_namestr
Name of the dataset.
- Returns:
- torch_geometric.data.Data
Hypergraph dataset.
- topobench.data.utils.load_inductive_splits(dataset, parameters)[source]#
Load multiple-graph datasets with the specified split.
- Parameters:
- datasettorch_geometric.data.Dataset
Graph dataset.
- parametersDictConfig
Configuration parameters.
- Returns:
- list:
List containing the train, validation, and test splits.
- topobench.data.utils.load_manual_graph()[source]#
Create a manual graph for testing purposes.
- Returns:
- torch_geometric.data.Data
Manual graph.
- topobench.data.utils.load_simplicial_dataset(cfg)[source]#
Load simplicial datasets.
- Parameters:
- cfgDictConfig
Configuration parameters.
- Returns:
- torch_geometric.data.Data
Simplicial dataset.
- topobench.data.utils.load_transductive_splits(dataset, parameters)[source]#
Load the graph dataset with the specified split.
- Parameters:
- datasettorch_geometric.data.Dataset
Graph dataset.
- parametersDictConfig
Configuration parameters.
- Returns:
- list:
List containing the train, validation, and test splits.
- topobench.data.utils.make_hash(o)[source]#
Make a hash from a dictionary, list, tuple or set to any level, that contains only other hashable types.
- Parameters:
- odict, list, tuple, set
Object to hash.
- Returns:
- int
Hash of the object.
- topobench.data.utils.read_us_county_demos(path, year=2012, y_col='Election')[source]#
Load US County Demos dataset.
- Parameters:
- pathstr
Path to the dataset.
- yearint, optional
Year to load the features (default: 2012).
- y_colstr, optional
Column to use as label. Can be one of [‘Election’, ‘MedianIncome’, ‘MigraRate’, ‘BirthRate’, ‘DeathRate’, ‘BachelorRate’, ‘UnemploymentRate’] (default: “Election”).
- Returns:
- torch_geometric.data.Data
Data object of the graph for the US County Demos dataset.
- topobench.data.utils.select_neighborhoods_of_interest(connectivity, neighborhoods)[source]#
Select the neighborhoods of interest.
- Parameters:
- connectivitydict
Connectivity matrices generated by default.
- neighborhoodslist
List of neighborhoods of interest.
- Returns:
- dict
Connectivity matrices of interest.