topobench.transforms.liftings.pointcloud2hypergraph.mogmst_lifting module#

Mixture of Gaussians and Minimum Spanning Tree (MoGMST) Lifting.

class GaussianMixture(n_components=1, *, covariance_type='full', tol=0.001, reg_covar=1e-06, max_iter=100, n_init=1, init_params='kmeans', weights_init=None, means_init=None, precisions_init=None, random_state=None, warm_start=False, verbose=0, verbose_interval=10)#

Bases: BaseMixture

Gaussian Mixture.

Representation of a Gaussian mixture model probability distribution. This class allows to estimate the parameters of a Gaussian mixture distribution.

Read more in the User Guide.

Added in version 0.18.

Parameters:

n_componentsint, default=1

The number of mixture components.

covariance_type{‘full’, ‘tied’, ‘diag’, ‘spherical’}, default=’full’

String describing the type of covariance parameters to use. Must be one of:

‘full’: each component has its own general covariance matrix.
‘tied’: all components share the same general covariance matrix.
‘diag’: each component has its own diagonal covariance matrix.
‘spherical’: each component has its own single variance.

For an example of using covariance_type, refer to sphx_glr_auto_examples_mixture_plot_gmm_selection.py.

tolfloat, default=1e-3

The convergence threshold. EM iterations will stop when the lower bound average gain is below this threshold.

reg_covarfloat, default=1e-6

Non-negative regularization added to the diagonal of covariance. Allows to assure that the covariance matrices are all positive.

max_iterint, default=100

The number of EM iterations to perform.

n_initint, default=1

The number of initializations to perform. The best results are kept.

init_params{‘kmeans’, ‘k-means++’, ‘random’, ‘random_from_data’}, default=’kmeans’

The method used to initialize the weights, the means and the precisions. String must be one of:

‘kmeans’ : responsibilities are initialized using kmeans.
‘k-means++’ : use the k-means++ method to initialize.
‘random’ : responsibilities are initialized randomly.
‘random_from_data’ : initial means are randomly selected data points.

Changed in version v1.1: init_params now accepts ‘random_from_data’ and ‘k-means++’ as initialization methods.

weights_initarray-like of shape (n_components, ), default=None

The user-provided initial weights. If it is None, weights are initialized using the init_params method.

means_initarray-like of shape (n_components, n_features), default=None

The user-provided initial means, If it is None, means are initialized using the init_params method.

precisions_initarray-like, default=None

The user-provided initial precisions (inverse of the covariance matrices). If it is None, precisions are initialized using the ‘init_params’ method. The shape depends on ‘covariance_type’:

(n_components,)                        if 'spherical',
(n_features, n_features)               if 'tied',
(n_components, n_features)             if 'diag',
(n_components, n_features, n_features) if 'full'

random_stateint, RandomState instance or None, default=None

Controls the random seed given to the method chosen to initialize the parameters (see init_params). In addition, it controls the generation of random samples from the fitted distribution (see the method sample). Pass an int for reproducible output across multiple function calls. See Glossary.

warm_startbool, default=False

If ‘warm_start’ is True, the solution of the last fitting is used as initialization for the next call of fit(). This can speed up convergence when fit is called several times on similar problems. In that case, ‘n_init’ is ignored and only a single initialization occurs upon the first call. See the Glossary.

verboseint, default=0

Enable verbose output. If 1 then it prints the current initialization and each iteration step. If greater than 1 then it prints also the log probability and the time needed for each step.

verbose_intervalint, default=10

Number of iteration done before the next print.

Attributes:

weights_array-like of shape (n_components,)

The weights of each mixture components.

means_array-like of shape (n_components, n_features)

The mean of each mixture component.

covariances_array-like

The covariance of each mixture component. The shape depends on covariance_type:

(n_components,)                        if 'spherical',
(n_features, n_features)               if 'tied',
(n_components, n_features)             if 'diag',
(n_components, n_features, n_features) if 'full'

For an example of using covariances, refer to sphx_glr_auto_examples_mixture_plot_gmm_covariances.py.

precisions_array-like

The precision matrices for each component in the mixture. A precision matrix is the inverse of a covariance matrix. A covariance matrix is symmetric positive definite so the mixture of Gaussian can be equivalently parameterized by the precision matrices. Storing the precision matrices instead of the covariance matrices makes it more efficient to compute the log-likelihood of new samples at test time. The shape depends on covariance_type:

(n_components,)                        if 'spherical',
(n_features, n_features)               if 'tied',
(n_components, n_features)             if 'diag',
(n_components, n_features, n_features) if 'full'

precisions_cholesky_array-like

The cholesky decomposition of the precision matrices of each mixture component. A precision matrix is the inverse of a covariance matrix. A covariance matrix is symmetric positive definite so the mixture of Gaussian can be equivalently parameterized by the precision matrices. Storing the precision matrices instead of the covariance matrices makes it more efficient to compute the log-likelihood of new samples at test time. The shape depends on covariance_type:

(n_components,)                        if 'spherical',
(n_features, n_features)               if 'tied',
(n_components, n_features)             if 'diag',
(n_components, n_features, n_features) if 'full'

converged_bool

True when convergence of the best fit of EM was reached, False otherwise.

n_iter_int

Number of step used by the best fit of EM to reach the convergence.

lower_bound_float

Lower bound value on the log-likelihood (of the training data with respect to the model) of the best fit of EM.

lower_bounds_array-like of shape (n_iter_,)

The list of lower bound values on the log-likelihood from each iteration of the best fit of EM.

n_features_in_int

Number of features seen during fit.

Added in version 0.24.

feature_names_in_ndarray of shape (n_features_in_,)

Names of features seen during fit. Defined only when X has feature names that are all strings.

Added in version 1.0.

See also

to_numpy_array

Notes

For directed graphs, explicitly mention create_using=nx.DiGraph, and entry i,j of A corresponds to an edge from i to j.

If create_using is networkx.MultiGraph or networkx.MultiDiGraph, parallel_edges is True, and the entries of A are of type int, then this function returns a multigraph (of the same type as create_using) with parallel edges.

If create_using indicates an undirected multigraph, then only the edges indicated by the upper triangle of the array A will be added to the graph.

If the NumPy array has a single data type for each array entry it will be converted to an appropriate Python data type.

If the NumPy array has a user-specified compound data type the names of the data fields will be used as attribute keys in the resulting NetworkX graph.

Examples

Simple integer weights on edges:

>>> import numpy as np
>>> A = np.array([[1, 1], [2, 1]])
>>> G = nx.from_numpy_array(A)
>>> G.edges(data=True)
EdgeDataView([(0, 0, {'weight': 1}), (0, 1, {'weight': 2}), (1, 1, {'weight': 1})])

If create_using indicates a multigraph and the array has only integer entries and parallel_edges is False, then the entries will be treated as weights for edges joining the nodes (without creating parallel edges):

>>> A = np.array([[1, 1], [1, 2]])
>>> G = nx.from_numpy_array(A, create_using=nx.MultiGraph)
>>> G[1][1]
AtlasView({0: {'weight': 2}})

If create_using indicates a multigraph and the array has only integer entries and parallel_edges is True, then the entries will be treated as the number of parallel edges joining those two vertices:

>>> A = np.array([[1, 1], [1, 2]])
>>> temp = nx.MultiGraph()
>>> G = nx.from_numpy_array(A, parallel_edges=True, create_using=temp)
>>> G[1][1]
AtlasView({0: {'weight': 1}, 1: {'weight': 1}})

User defined compound data type on edges:

>>> dt = [("weight", float), ("cost", int)]
>>> A = np.array([[(1.0, 2)]], dtype=dt)
>>> G = nx.from_numpy_array(A)
>>> G.edges()
EdgeView([(0, 0)])
>>> G[0][0]["cost"]
2
>>> G[0][0]["weight"]
1.0

minimum_spanning_tree(G, weight='weight', algorithm='kruskal', ignore_nan=False)#

Returns a minimum spanning tree or forest on an undirected graph G.

Parameters:

Gundirected graph: An undirected graph. If G is connected, then the algorithm finds a spanning tree. Otherwise, a spanning forest is found.
weightstr: Data key to use for edge weights.
algorithmstring: The algorithm to use when finding a minimum spanning tree. Valid choices are ‘kruskal’, ‘prim’, or ‘boruvka’. The default is ‘kruskal’.
ignore_nanbool (default: False): If a NaN is found as an edge weight normally an exception is raised. If ignore_nan is True then that edge is ignored instead.

Returns:

GNetworkX Graph: A minimum spanning tree or forest.

Notes

For Borůvka’s algorithm, each edge must have a weight attribute, and each edge weight must be distinct.

For the other algorithms, if the graph edges do not have a weight attribute a default weight of 1 will be used.

There may be more than one tree with the same minimum or maximum weight. See networkx.tree.recognition for more detailed definitions.

Isolated nodes with self-loops are in the tree as edgeless isolated nodes.

Examples

>>> G = nx.cycle_graph(4)
>>> G.add_edge(0, 3, weight=2)
>>> T = nx.minimum_spanning_tree(G)
>>> sorted(T.edges(data=True))
[(0, 1, {}), (1, 2, {}), (2, 3, {})]

pairwise_distances(X, Y=None, metric='euclidean', *, n_jobs=None, force_all_finite='deprecated', ensure_all_finite=None, **kwds)#

Compute the distance matrix from a feature array X and optional Y.

This function takes one or two feature arrays or a distance matrix, and returns a distance matrix.

If X is a feature array, of shape (n_samples_X, n_features), and:
- Y is None and metric is not ‘precomputed’, the pairwise distances between X and itself are returned.
- Y is a feature array of shape (n_samples_Y, n_features), the pairwise distances between X and Y is returned.
If X is a distance matrix, of shape (n_samples_X, n_samples_X), metric should be ‘precomputed’. Y is thus ignored and X is returned as is.

If the input is a collection of non-numeric data (e.g. a list of strings or a boolean array), a custom metric must be passed.

This method provides a safe way to take a distance matrix as input, while preserving compatibility with many other algorithms that take a vector array.

Valid values for metric are:

From scikit-learn: [‘cityblock’, ‘cosine’, ‘euclidean’, ‘l1’, ‘l2’, ‘manhattan’, ‘nan_euclidean’]. All metrics support sparse matrix inputs except ‘nan_euclidean’.
From scipy.spatial.distance: [‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘correlation’, ‘dice’, ‘hamming’, ‘jaccard’, ‘kulsinski’, ‘mahalanobis’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘yule’]. These metrics do not support sparse matrix inputs.

Note

‘kulsinski’ is deprecated from SciPy 1.9 and will be removed in SciPy 1.11.

Note

‘matching’ has been removed in SciPy 1.9 (use ‘hamming’ instead).

Note that in the case of ‘cityblock’, ‘cosine’ and ‘euclidean’ (which are valid scipy.spatial.distance metrics), the scikit-learn implementation will be used, which is faster and has support for sparse matrices (except for ‘cityblock’). For a verbose description of the metrics from scikit-learn, see sklearn.metrics.pairwise.distance_metrics() function.