topobench.evaluator.evaluator module#

This module contains the Evaluator class that is responsible for computing the metrics.

class AbstractEvaluator#

Bases: ABC

Abstract class for the evaluator class.

__init__()#

abstract compute()#: Compute the metrics.

abstract reset()#: Reset the metrics.

abstract update(model_out)#

Update the metrics with the model output.

Parameters:

model_outdict: The model output.

class MetricCollection(metrics, *additional_metrics, prefix=None, postfix=None, compute_groups=True)#

Bases: ModuleDict

MetricCollection class can be used to chain metrics that have the same call pattern into one single class.

Parameters:

metrics (Metric | MetricCollection | Sequence[Metric | MetricCollection] | dict[str, Metric | MetricCollection]) –
One of the following
- list or tuple (sequence): if metrics are passed in as a list or tuple, will use the metrics class name as key for output dict. Therefore, two metrics of the same class cannot be chained this way.
- arguments: similar to passing in as a list, metrics passed in as arguments will use their metric class name as key for the output dict.
- dict: if metrics are passed in as a dict, will use each key in the dict as key for output dict. Use this format if you want to chain together multiple of the same metric with different parameters. Note that the keys in the output dict will be sorted alphabetically.
prefix (str | None) – a string to append in front of the keys of the output dict
postfix (str | None) – a string to append after the keys of the output dict
compute_groups (bool | list[list[str]]) – By default the MetricCollection will try to reduce the computations needed for the metrics in the collection by checking if they belong to the same compute group. All metrics in a compute group share the same metric state and are therefore only different in their compute step e.g. accuracy, precision and recall can all be computed from the true positives/negatives and false positives/negatives. By default, this argument is True which enables this feature. Set this argument to False for disabling this behaviour. Can also be set to a list of lists of metrics for setting the compute groups yourself.

Tip

The compute groups feature can significantly speedup the calculation of metrics under the right conditions. First, the feature is only available when calling the update method and not when calling forward method due to the internal logic of forward preventing this. Secondly, since we compute groups share metric states by reference, calling .items(), .values() etc. on the metric collection will break this reference and a copy of states are instead returned in this case (reference will be reestablished on the next call to update). Do note that for the time being that if you are manually specifying compute groups in nested collections, these are not compatible with the compute groups of the parent collection and will be overridden.

Important

Metric collections can be nested at initialization (see last example) but the output of the collection will still be a single flatten dictionary combining the prefix and postfix arguments from the nested collection.

Raises:

ValueError – If one of the elements of metrics is not an instance of pl.metrics.Metric.
ValueError – If two elements in metrics have the same name.
ValueError – If metrics is not a list, tuple or a dict.
ValueError – If metrics is dict and additional_metrics are passed in.
ValueError – If prefix is set and it is not a string.
ValueError – If postfix is set and it is not a string.

Example::

In the most basic case, the metrics can be passed in as a list or tuple. The keys of the output dict will be the same as the class name of the metric:

>>> from torch import tensor
>>> from pprint import pprint
>>> from torchmetrics import MetricCollection
>>> from torchmetrics.regression import MeanSquaredError
>>> from torchmetrics.classification import MulticlassAccuracy, MulticlassPrecision, MulticlassRecall
>>> target = tensor([0, 2, 0, 2, 0, 1, 0, 2])
>>> preds = tensor([2, 1, 2, 0, 1, 2, 2, 2])
>>> metrics = MetricCollection([MulticlassAccuracy(num_classes=3, average='micro'),
...                             MulticlassPrecision(num_classes=3, average='macro'),
...                             MulticlassRecall(num_classes=3, average='macro')])
>>> metrics(preds, target)  
{'MulticlassAccuracy': tensor(0.1250),
 'MulticlassPrecision': tensor(0.0667),
 'MulticlassRecall': tensor(0.1111)}

Example::

Alternatively, metrics can be passed in as arguments. The keys of the output dict will be the same as the class name of the metric:

>>> metrics = MetricCollection(MulticlassAccuracy(num_classes=3, average='micro'),
...                            MulticlassPrecision(num_classes=3, average='macro'),
...                            MulticlassRecall(num_classes=3, average='macro'))
>>> metrics(preds, target)  
{'MulticlassAccuracy': tensor(0.1250),
 'MulticlassPrecision': tensor(0.0667),
 'MulticlassRecall': tensor(0.1111)}

Example::

If multiple of the same metric class (with different parameters) should be chained together, metrics can be passed in as a dict and the output dict will have the same keys as the input dict:

>>> metrics = MetricCollection({'micro_recall': MulticlassRecall(num_classes=3, average='micro'),
...                             'macro_recall': MulticlassRecall(num_classes=3, average='macro')})
>>> same_metric = metrics.clone()
>>> pprint(metrics(preds, target))
{'macro_recall': tensor(0.1111), 'micro_recall': tensor(0.1250)}
>>> pprint(same_metric(preds, target))
{'macro_recall': tensor(0.1111), 'micro_recall': tensor(0.1250)}

Example::

Metric collections can also be nested up to a single time. The output of the collection will still be a single dict with the prefix and postfix arguments from the nested collection:

>>> metrics = MetricCollection([
...     MetricCollection([
...         MulticlassAccuracy(num_classes=3, average='macro'),
...         MulticlassPrecision(num_classes=3, average='macro')
...     ], postfix='_macro'),
...     MetricCollection([
...         MulticlassAccuracy(num_classes=3, average='micro'),
...         MulticlassPrecision(num_classes=3, average='micro')
...     ], postfix='_micro'),
... ], prefix='valmetrics/')
>>> pprint(metrics(preds, target))  
{'valmetrics/MulticlassAccuracy_macro': tensor(0.1111),
 'valmetrics/MulticlassAccuracy_micro': tensor(0.1250),
 'valmetrics/MulticlassPrecision_macro': tensor(0.0667),
 'valmetrics/MulticlassPrecision_micro': tensor(0.1250)}

Example::

The compute_groups argument allow you to specify which metrics should share metric state. By default, this will automatically be derived but can also be set manually.

>>> metrics = MetricCollection(
...     MulticlassRecall(num_classes=3, average='macro'),
...     MulticlassPrecision(num_classes=3, average='macro'),
...     MeanSquaredError(),
...     compute_groups=[['MulticlassRecall', 'MulticlassPrecision'], ['MeanSquaredError']]
... )
>>> metrics.update(preds, target)
>>> pprint(metrics.compute())
{'MeanSquaredError': tensor(2.3750), 'MulticlassPrecision': tensor(0.0667), 'MulticlassRecall': tensor(0.1111)}
>>> pprint(metrics.compute_groups)
{0: ['MulticlassRecall', 'MulticlassPrecision'], 1: ['MeanSquaredError']}

__init__(metrics, *additional_metrics, prefix=None, postfix=None, compute_groups=True)#

Initialize internal Module state, shared by both nn.Module and ScriptModule.

add_metrics(metrics, *additional_metrics)#

Add new metrics to Metric Collection.

clone(prefix=None, postfix=None)#

Make a copy of the metric collection.

Parameters:

prefix (str | None) – a string to append in front of the metric keys
postfix (str | None) – a string to append after the keys of the output dict.

compute()#

Compute the result for each metric in the collection.

forward(*args, **kwargs)#

Call forward for each metric sequentially.

Positional arguments (args) will be passed to every metric in the collection, while keyword arguments (kwargs) will be filtered based on the signature of the individual metric.

items(keep_base=False, copy_state=True)#

Return an iterable of the ModuleDict key/value pairs.

Parameters:

keep_base (bool) – Whether to add prefix/postfix on the collection.
copy_state (bool) – If metric states should be copied between metrics in the same compute group or just passed by reference

keys(keep_base=False)#

Return an iterable of the ModuleDict key.

Parameters:: keep_base (bool) – Whether to add prefix/postfix on the items collection.

persistent(mode=True)#

Change if metric states should be saved to its state_dict after initialization.

plot(val=None, ax=None, together=False)#

Plot a single or multiple values from the metric.

The plot method has two modes of operation. If argument together is set to False (default), the .plot method of each metric will be called individually and the result will be list of figures. If together is set to True, the values of all metrics will instead be plotted in the same figure.

Parameters:

val (dict | Sequence[dict] | None) – Either a single result from calling metric.forward or metric.compute or a list of these results. If no value is provided, will automatically call metric.compute and plot that result.
ax (Axes | Sequence[Axes] | None) – Either a single instance of matplotlib axis object or an sequence of matplotlib axis objects. If provided, will add the plots to the provided axis objects. If not provided, will create a new. If argument together is set to True, a single object is expected. If together is set to False, the number of axis objects needs to be the same length as the number of metrics in the collection.
together (bool) – If True, will plot all metrics in the same axis. If False, will plot each metric in a separate

Returns:

Either install tuple of Figure and Axes object or an sequence of tuples with Figure and Axes object for each metric in the collection.

Raises:

ModuleNotFoundError – If matplotlib is not installed
ValueError – If together is not an bool
ValueError – If ax is not an instance of matplotlib axis object or a sequence of matplotlib axis objects

Return type:

Sequence[tuple[Figure, Axes | ndarray]]

reset()#

Call reset for each metric sequentially.

set_dtype(dst_type)#

Transfer all metric state to specific dtype. Special version of standard type method.

Parameters:: dst_type (str | dtype) – the desired type as torch.dtype or string.

update(*args, **kwargs)#

Call update for each metric sequentially.

Positional arguments (args) will be passed to every metric in the collection, while keyword arguments (kwargs) will be filtered based on the signature of the individual metric.

values(copy_state=True)#

Return an iterable of the ModuleDict values.

Parameters:: copy_state (bool) – If metric states should be copied between metrics in the same compute group or just passed by reference

property compute_groups: Dict[int, List[str]]#: Return a dict with the current compute groups in the collection.

property metric_state: dict[str, dict[str, Any]]#: Get the current state of the metric.

class TBEvaluator(task, **kwargs)#

Bases: AbstractEvaluator

Evaluator class that is responsible for computing the metrics.

Parameters:

taskstr: The task type. It can be either “classification” or “regression”.
**kwargsdict: Additional arguments for the class. The arguments depend on the task. In “classification” scenario, the following arguments are expected: - num_classes (int): The number of classes. - metrics (list[str]): A list of classification metrics to be computed. In “regression” scenario, the following arguments are expected: - metrics (list[str]): A list of regression metrics to be computed.

__init__(task, **kwargs)#

compute()#

Compute the metrics.

Returns:

dict: Dictionary containing the computed metrics.

reset()#

Reset the metrics.

This method should be called after each epoch.

update(model_out)#

Update the metrics with the model output.

Parameters:

model_outdict: The model output. It should contain the following keys: - logits : torch.Tensor The model predictions. - labels : torch.Tensor The ground truth labels. - batch : torch_geometric.data.Data (optional) The batch data containing target normalizer stats.

Raises:

ValueError: If the task is not valid.