topobench.data.preprocessor.preprocessor module#
Preprocessor for datasets.
- class DataTransform(transform_name, **kwargs)#
Bases:
BaseTransformAbstract class to define a custom data lifting.
- Parameters:
- transform_namestr
The name of the transform to be used.
- **kwargsdict
Additional arguments for the class. Should contain “transform_name”.
- __init__(transform_name, **kwargs)#
- forward(data)#
Forward pass of the lifting.
- Parameters:
- datatorch_geometric.data.Data
The input data to be lifted.
- Returns:
- torch_geometric.data.Data
The lifted data.
- class DataloadDataset(data_lst)#
Bases:
DatasetCustom dataset to return all the values added to the dataset object.
- Parameters:
- data_lstlist[torch_geometric.data.Data]
List of torch_geometric.data.Data objects.
- __init__(data_lst)#
- get(idx)#
Get data object from data list.
- Parameters:
- idxint
Index of the data object to get.
- Returns:
- tuple
Tuple containing a list of all the values for the data and the corresponding keys.
- len()#
Return the length of the dataset.
- Returns:
- int
Length of the dataset.
- FileLock#
alias of
UnixFileLock
- class PreProcessor(dataset, data_dir, transforms_config=None, **kwargs)#
Bases:
InMemoryDatasetPreprocessor for datasets.
- Parameters:
- datasetlist
List of data objects.
- data_dirstr
Path to the directory containing the data.
- transforms_configDictConfig, optional
Configuration parameters for the transforms (default: None).
- **kwargsoptional
Optional additional arguments.
- __init__(dataset, data_dir, transforms_config=None, **kwargs)#
- instantiate_pre_transform(data_dir, transforms_config)#
Instantiate the pre-transforms.
- Parameters:
- data_dirstr
Path to the directory containing the data.
- transforms_configDictConfig
Configuration parameters for the transforms.
- Returns:
- torch_geometric.transforms.Compose
Pre-transform object.
- load(path)#
Load the dataset from the file path path.
- Parameters:
- pathstr
The path to the processed data.
- load_dataset_splits(split_params)#
Load the dataset splits.
- Parameters:
- split_paramsdict
Parameters for loading the dataset splits.
- Returns:
- tuple
A tuple containing the train, validation, and test datasets.
- process()#
Method that processes the data.
- save_transform_parameters()#
Save the transform parameters.
- set_processed_data_dir(pre_transforms_dict, data_dir, transforms_config)#
Set the processed data directory.
- Parameters:
- pre_transforms_dictdict
Dictionary containing the pre-transforms.
- data_dirstr
Path to the directory containing the data.
- transforms_configDictConfig
Configuration parameters for the transforms.
- class tqdm(*_, **__)#
Bases:
ComparableDecorate an iterable object, returning an iterator which acts exactly like the original iterable, but prints a dynamically updating progressbar every time a value is requested.
- Parameters:
- iterableiterable, optional
Iterable to decorate with a progressbar. Leave blank to manually manage the updates.
- descstr, optional
Prefix for the progressbar.
- totalint or float, optional
The number of expected iterations. If unspecified, len(iterable) is used if possible. If float(“inf”) or as a last resort, only basic progress statistics are displayed (no ETA, no progressbar). If gui is True and this parameter needs subsequent updating, specify an initial arbitrary large positive number, e.g. 9e9.
- leavebool, optional
If [default: True], keeps all traces of the progressbar upon termination of iteration. If None, will leave only if position is 0.
- fileio.TextIOWrapper or io.StringIO, optional
Specifies where to output the progress messages (default: sys.stderr). Uses file.write(str) and file.flush() methods. For encoding, see write_bytes.
- ncolsint, optional
The width of the entire output message. If specified, dynamically resizes the progressbar to stay within this bound. If unspecified, attempts to use environment width. The fallback is a meter width of 10 and no limit for the counter and statistics. If 0, will not print any meter (only stats).
- minintervalfloat, optional
Minimum progress display update interval [default: 0.1] seconds.
- maxintervalfloat, optional
Maximum progress display update interval [default: 10] seconds. Automatically adjusts miniters to correspond to mininterval after long display update lag. Only works if dynamic_miniters or monitor thread is enabled.
- minitersint or float, optional
Minimum progress display update interval, in iterations. If 0 and dynamic_miniters, will automatically adjust to equal mininterval (more CPU efficient, good for tight loops). If > 0, will skip display of specified number of iterations. Tweak this and mininterval to get very efficient loops. If your progress is erratic with both fast and slow iterations (network, skipping items, etc) you should set miniters=1.
- asciibool or str, optional
If unspecified or False, use unicode (smooth blocks) to fill the meter. The fallback is to use ASCII characters “ 123456789#”.
- disablebool, optional
Whether to disable the entire progressbar wrapper [default: False]. If set to None, disable on non-TTY.
- unitstr, optional
String that will be used to define the unit of each iteration [default: it].
- unit_scalebool or int or float, optional
If 1 or True, the number of iterations will be reduced/scaled automatically and a metric prefix following the International System of Units standard will be added (kilo, mega, etc.) [default: False]. If any other non-zero number, will scale total and n.
- dynamic_ncolsbool, optional
If set, constantly alters ncols and nrows to the environment (allowing for window resizes) [default: False].
- smoothingfloat, optional
Exponential moving average smoothing factor for speed estimates (ignored in GUI mode). Ranges from 0 (average speed) to 1 (current/instantaneous speed) [default: 0.3].
- bar_formatstr, optional
Specify a custom bar string formatting. May impact performance. [default: ‘{l_bar}{bar}{r_bar}’], where l_bar=’{desc}: {percentage:3.0f}%|’ and r_bar=’| {n_fmt}/{total_fmt} [{elapsed}<{remaining}, ‘
‘{rate_fmt}{postfix}]’
- Possible vars: l_bar, bar, r_bar, n, n_fmt, total, total_fmt,
percentage, elapsed, elapsed_s, ncols, nrows, desc, unit, rate, rate_fmt, rate_noinv, rate_noinv_fmt, rate_inv, rate_inv_fmt, postfix, unit_divisor, remaining, remaining_s, eta.
Note that a trailing “: “ is automatically removed after {desc} if the latter is empty.
- initialint or float, optional
The initial counter value. Useful when restarting a progress bar [default: 0]. If using float, consider specifying {n:.3f} or similar in bar_format, or specifying unit_scale.
- positionint, optional
Specify the line offset to print this bar (starting from 0) Automatic if unspecified. Useful to manage multiple bars at once (eg, from threads).
- postfixdict or *, optional
Specify additional stats to display at the end of the bar. Calls set_postfix(**postfix) if possible (dict).
- unit_divisorfloat, optional
[default: 1000], ignored unless unit_scale is True.
- write_bytesbool, optional
Whether to write bytes. If (default: False) will write unicode.
- lock_argstuple, optional
Passed to refresh for intermediate output (initialisation, iterating, and updating).
- nrowsint, optional
The screen height. If specified, hides nested bars outside this bound. If unspecified, attempts to use environment height. The fallback is 20.
- colourstr, optional
Bar colour (e.g. ‘green’, ‘#00ff00’).
- delayfloat, optional
Don’t display until [default: 0] seconds have elapsed.
- guibool, optional
WARNING: internal parameter - do not use. Use tqdm.gui.tqdm(…) instead. If set, will attempt to use matplotlib animations for a graphical output [default: False].
- Returns:
- outdecorated iterator.
- classmethod external_write_mode(file=None, nolock=False)#
Disable tqdm within context and refresh tqdm when exits. Useful when writing to standard output stream
- classmethod get_lock()#
Get the global lock. Construct it if it does not exist.
- classmethod pandas(**tqdm_kwargs)#
- Registers the current tqdm class with
pandas.core. ( frame.DataFrame | series.Series | groupby.(generic.)DataFrameGroupBy | groupby.(generic.)SeriesGroupBy ).progress_apply
A new instance will be created every time progress_apply is called, and each instance will automatically close() upon completion.
- Parameters:
- tqdm_kwargsarguments for the tqdm instance
References
<https://stackoverflow.com/questions/18603270/ progress-indicator-during-pandas-operations-python>
Examples
>>> import pandas as pd >>> import numpy as np >>> from tqdm import tqdm >>> from tqdm.gui import tqdm as tqdm_gui >>> >>> df = pd.DataFrame(np.random.randint(0, 100, (100000, 6))) >>> tqdm.pandas(ncols=50) # can use tqdm_gui, optional kwargs, etc >>> # Now you can use `progress_apply` instead of `apply` >>> df.groupby(0).progress_apply(lambda x: x**2)
- classmethod set_lock(lock)#
Set the global lock.
- classmethod wrapattr(stream, method, total=None, bytes=True, **tqdm_kwargs)#
stream : file-like object. method : str, “read” or “write”. The result of read() and
the first argument of write() should have a len().
>>> with tqdm.wrapattr(file_obj, "read", total=file_obj.size) as fobj: ... while True: ... chunk = fobj.read(chunk_size) ... if not chunk: ... break
- classmethod write(s, file=None, end='\n', nolock=False)#
Print a message via tqdm (without overlap with bars).
- static format_interval(t)#
Formats a number of seconds as a clock time, [H:]MM:SS
- Parameters:
- tint
Number of seconds.
- Returns:
- outstr
[H:]MM:SS
- static format_meter(n, total, elapsed, ncols=None, prefix='', ascii=False, unit='it', unit_scale=False, rate=None, bar_format=None, postfix=None, unit_divisor=1000, initial=0, colour=None, **extra_kwargs)#
Return a string-based progress bar given some parameters
- Parameters:
- nint or float
Number of finished iterations.
- totalint or float
The expected total number of iterations. If meaningless (None), only basic progress statistics are displayed (no ETA).
- elapsedfloat
Number of seconds passed since start.
- ncolsint, optional
The width of the entire output message. If specified, dynamically resizes {bar} to stay within this bound [default: None]. If 0, will not print any bar (only stats). The fallback is {bar:10}.
- prefixstr, optional
Prefix message (included in total width) [default: ‘’]. Use as {desc} in bar_format string.
- asciibool, optional or str, optional
If not set, use unicode (smooth blocks) to fill the meter [default: False]. The fallback is to use ASCII characters “ 123456789#”.
- unitstr, optional
The iteration unit [default: ‘it’].
- unit_scalebool or int or float, optional
If 1 or True, the number of iterations will be printed with an appropriate SI metric prefix (k = 10^3, M = 10^6, etc.) [default: False]. If any other non-zero number, will scale total and n.
- ratefloat, optional
Manual override for iteration rate. If [default: None], uses n/elapsed.
- bar_formatstr, optional
Specify a custom bar string formatting. May impact performance. [default: ‘{l_bar}{bar}{r_bar}’], where l_bar=’{desc}: {percentage:3.0f}%|’ and r_bar=’| {n_fmt}/{total_fmt} [{elapsed}<{remaining}, ‘
‘{rate_fmt}{postfix}]’
- Possible vars: l_bar, bar, r_bar, n, n_fmt, total, total_fmt,
percentage, elapsed, elapsed_s, ncols, nrows, desc, unit, rate, rate_fmt, rate_noinv, rate_noinv_fmt, rate_inv, rate_inv_fmt, postfix, unit_divisor, remaining, remaining_s, eta.
Note that a trailing “: “ is automatically removed after {desc} if the latter is empty.
- postfix*, optional
Similar to prefix, but placed at the end (e.g. for additional stats). Note: postfix is usually a string (not a dict) for this method, and will if possible be set to postfix = ‘, ‘ + postfix. However other types are supported (#382).
- unit_divisorfloat, optional
[default: 1000], ignored unless unit_scale is True.
- initialint or float, optional
The initial counter value [default: 0].
- colourstr, optional
Bar colour (e.g. ‘green’, ‘#00ff00’).
- Returns:
- outFormatted meter and stats, ready to display.
- static format_num(n)#
Intelligent scientific notation (.3g).
- Parameters:
- nint or float or Numeric
A Number.
- Returns:
- outstr
Formatted number.
- static format_sizeof(num, suffix='', divisor=1000)#
Formats a number (greater than unity) with SI Order of Magnitude prefixes.
- Parameters:
- numfloat
Number ( >= 1) to format.
- suffixstr, optional
Post-postfix [default: ‘’].
- divisorfloat, optional
Divisor between prefixes [default: 1000].
- Returns:
- outstr
Number with Order of Magnitude SI unit postfix.
- static status_printer(file)#
Manage the printing and in-place updating of a line of characters. Note that if the string is longer than a line, then in-place updating may not work (it will print a new line at each refresh).
- __init__(iterable=None, desc=None, total=None, leave=True, file=None, ncols=None, mininterval=0.1, maxinterval=10.0, miniters=None, ascii=None, disable=False, unit='it', unit_scale=False, dynamic_ncols=False, smoothing=0.3, bar_format=None, initial=0, position=None, postfix=None, unit_divisor=1000, write_bytes=False, lock_args=None, nrows=None, colour=None, delay=0.0, gui=False, **kwargs)#
- clear(nolock=False)#
Clear current bar display.
- close()#
Cleanup and (if leave=False) close the progressbar.
- display(msg=None, pos=None)#
Use self.sp to display msg in the specified pos.
Consider overloading this function when inheriting to use e.g.: self.some_frontend(**self.format_dict) instead of self.sp.
- Parameters:
- msgstr, optional. What to display (default: repr(self)).
- posint, optional. Position to moveto
(default: abs(self.pos)).
- moveto(n)#
- refresh(nolock=False, lock_args=None)#
Force refresh the display of this bar.
- Parameters:
- nolockbool, optional
If True, does not lock. If [default: False]: calls acquire() on internal lock.
- lock_argstuple, optional
Passed to internal lock’s acquire(). If specified, will only display() if acquire() returns True.
- reset(total=None)#
Resets to 0 iterations for repeated use.
Consider combining with leave=True.
- Parameters:
- totalint or float, optional. Total to use for the new bar.
- set_description(desc=None, refresh=True)#
Set/modify description of the progress bar.
- Parameters:
- descstr, optional
- refreshbool, optional
Forces refresh [default: True].
- set_description_str(desc=None, refresh=True)#
Set/modify description without ‘: ‘ appended.
- set_postfix(ordered_dict=None, refresh=True, **kwargs)#
Set/modify postfix (additional stats) with automatic formatting based on datatype.
- Parameters:
- ordered_dictdict or OrderedDict, optional
- refreshbool, optional
Forces refresh [default: True].
- kwargsdict, optional
- set_postfix_str(s='', refresh=True)#
Postfix without dictionary expansion, similar to prefix handling.
- unpause()#
Restart tqdm timer from last print time.
- update(n=1)#
Manually update the progress bar, useful for streams such as reading files. E.g.: >>> t = tqdm(total=filesize) # Initialise >>> for current_buffer in stream: … … … t.update(len(current_buffer)) >>> t.close() The last line is highly recommended, but possibly not necessary if t.update() will be called in such a way that filesize will be exactly reached and printed.
- Parameters:
- nint or float, optional
Increment to add to the internal counter of iterations [default: 1]. If using float, consider specifying {n:.3f} or similar in bar_format, or specifying unit_scale.
- Returns:
- outbool or None
True if a display() was triggered.
- property format_dict#
Public API for read-only member access.
- monitor = None#
- monitor_interval = 10#
- ensure_serializable(obj)#
Ensure that the object is serializable.
- Parameters:
- objobject
Object to ensure serializability.
- Returns:
- object
Object that is serializable.
- load_inductive_splits(dataset, parameters)#
Load multiple-graph datasets with the specified split.
- Parameters:
- datasettorch_geometric.data.Dataset
Graph dataset.
- parametersDictConfig
Configuration parameters.
- Returns:
- list:
List containing the train, validation, and test splits.
- load_transductive_splits(dataset, parameters)#
Load the graph dataset with the specified split.
- Parameters:
- datasettorch_geometric.data.Dataset
Graph dataset.
- parametersDictConfig
Configuration parameters.
- Returns:
- list:
List containing the train, validation, and test splits.
- make_hash(o)#
Make a hash from a dictionary, list, tuple or set to any level, that contains only other hashable types.
- Parameters:
- odict, list, tuple, set
Object to hash.
- Returns:
- int
Hash of the object.