Skip to content

Configurations

Data and configurations

TrainConfig(max_iter=10000, print_every=0, write_every=0, checkpoint_every=0, plot_every=0, live_plot_every=0, callbacks=lambda: list()(), log_model=False, root_folder=Path('./qml_logs'), create_subfolder_per_run=False, log_folder=Path('./'), checkpoint_best_only=False, val_every=0, val_epsilon=1e-05, validation_criterion=None, trainstop_criterion=None, batch_size=1, verbose=True, tracking_tool=ExperimentTrackingTool.TENSORBOARD, hyperparams=dict(), plotting_functions=tuple(), _subfolders=list(), nprocs=1, compute_setup='cpu', backend='gloo', log_setup='cpu', dtype=None, all_reduce_metrics=False) dataclass

Default configuration for the training process.

This class provides default settings for various aspects of the training loop, such as logging, checkpointing, and validation. The default values for these fields can be customized when an instance of TrainConfig is created.

Example:

from perceptrain import TrainConfig
c = TrainConfig(root_folder="/tmp/train")
TrainConfig(max_iter=10000, print_every=0, write_every=0, checkpoint_every=0, plot_every=0, live_plot_every=0, callbacks=[], log_model=False, root_folder='/tmp/train', create_subfolder_per_run=False, log_folder=PosixPath('.'), checkpoint_best_only=False, val_every=0, val_epsilon=1e-05, validation_criterion=None, trainstop_criterion=None, batch_size=1, verbose=True, tracking_tool=<ExperimentTrackingTool.TENSORBOARD: 'tensorboard'>, hyperparams={}, plotting_functions=(), _subfolders=[], nprocs=1, compute_setup='cpu', backend='gloo', log_setup='cpu', dtype=None, all_reduce_metrics=False)

all_reduce_metrics = False class-attribute instance-attribute

Whether to aggregate metrics (e.g., loss, accuracy) across processes.

When True, metrics from different training processes are averaged to provide a consolidated metrics. Note: Since aggregation requires synchronization/all_reduce operation, this can increase the computation time significantly.

backend = 'gloo' class-attribute instance-attribute

Backend used for distributed training communication.

The default is "gloo". Other options may include "nccl" - which is optimized for GPU-based training or "mpi", depending on your system and requirements. It should be one of the backends supported by torch.distributed. For further details, please look at torch backends

batch_size = 1 class-attribute instance-attribute

The batch size to use when processing a list or tuple of torch.Tensors.

This specifies how many samples are processed in each training iteration.

callbacks = field(default_factory=lambda: list()) class-attribute instance-attribute

List of callbacks to execute during training.

Callbacks can be used for custom behaviors, such as early stopping, custom logging, or other actions triggered at specific events.

checkpoint_best_only = False class-attribute instance-attribute

If True, checkpoints are only saved if there is an improvement in the.

validation metric. This conserves storage by only keeping the best models.

validation_criterion is required when this is set to True.

checkpoint_every = 0 class-attribute instance-attribute

Frequency (in epochs) for saving model and optimizer checkpoints during training.

Set to 0 to disable checkpointing. This helps in resuming training or recovering models. Note that setting checkpoint_best_only = True will disable this and only best checkpoints will be saved.

compute_setup = 'cpu' class-attribute instance-attribute

Compute device setup; options are "auto", "gpu", or "cpu".

  • "auto": Automatically uses GPU if available; otherwise, falls back to CPU.
  • "gpu": Forces GPU usage, raising an error if no CUDA device is available.
  • "cpu": Forces the use of CPU regardless of GPU availability.

create_subfolder_per_run = False class-attribute instance-attribute

Whether to create a subfolder for each run, named <id>_<timestamp>_<PID>.

This ensures logs and checkpoints from different runs do not overwrite each other, which is helpful for rapid prototyping. If False, training will resume from the latest checkpoint if one exists in the specified log folder.

dtype = None class-attribute instance-attribute

Data type (precision) for computations.

Both model parameters, and dataset will be of the provided precision.

If not specified or None, the default torch precision (usually torch.float32) is used. If provided dtype is torch.complex128, model parameters will be torch.complex128, and data parameters will be torch.float64

hyperparams = field(default_factory=dict) class-attribute instance-attribute

A dictionary of hyperparameters to be tracked.

This can include learning rates, regularization parameters, or any other training-related configurations.

live_plot_every = 0 class-attribute instance-attribute

Frequency for live plotting all the metrics in a single dynamic subplot.

Set to 0 to disable.

for more personalized behaviour, such as showing only a subset of the

metrics or arranging over different subplots, leave this parameter to 0, define a LivePlotMetrics callback and pass it to callbacks.

log_folder = Path('./') class-attribute instance-attribute

The log folder for saving checkpoints and tensorboard logs.

This stores the path where all logs and checkpoints are being saved for this training session. log_folder takes precedence over root_folder, but it is ignored if create_subfolders_per_run=True (in which case, subfolders will be spawned in the root folder).

log_model = False class-attribute instance-attribute

Whether to log a serialized version of the model.

When set to True, the model's state will be logged, useful for model versioning and reproducibility.

log_setup = 'cpu' class-attribute instance-attribute

Logging device setup; options are "auto" or "cpu".

  • "auto": Uses the same device for logging as for computation.
  • "cpu": Forces logging to occur on the CPU. This can be useful to avoid potential conflicts with GPU processes.

max_iter = 10000 class-attribute instance-attribute

Number of training iterations (epochs) to perform.

This defines the total number of times the model will be updated.

In case of InfiniteTensorDataset, each epoch will have 1 batch. In case of TensorDataset, each epoch will have len(dataloader) batches.

nprocs = 1 class-attribute instance-attribute

The number of processes to use for training when spawning subprocesses.

For effective parallel processing, set this to a value greater than 1. - In case of Multi-GPU or Multi-Node-Multi-GPU setups, nprocs should be equal to the total number of GPUs across all nodes (world size), or total number of GPU to be used.

If nprocs > 1, multiple processes will be spawned for training. The training framework will launch additional processes (e.g., for distributed or parallel training). - For CPU setup, this will launch a true parallel processes - For GPU setup, this will launch a distributed training routine. This uses the DistributedDataParallel framework from PyTorch.

plot_every = 0 class-attribute instance-attribute

Frequency (in epochs) for generating and saving figures during training.

Set to 0 to disable plotting.

plotting_functions = field(default_factory=tuple) class-attribute instance-attribute

Functions used for in-training plotting.

These are called to generate plots that are logged or saved at specified intervals.

print_every = 0 class-attribute instance-attribute

Frequency (in epochs) for printing loss and metrics to the console during training.

Set to 0 to disable this output, meaning that metrics and loss will not be printed during training.

root_folder = Path('./qml_logs') class-attribute instance-attribute

The root folder for saving checkpoints and tensorboard logs.

The default path is "./qml_logs"

This can be set to a specific directory where training artifacts are to be stored. Checkpoints will be saved inside a subfolder in this directory. Subfolders will be created based on create_subfolder_per_run argument.

tracking_tool = ExperimentTrackingTool.TENSORBOARD class-attribute instance-attribute

The tool used for tracking training progress and logging metrics.

Options include tools like TensorBoard, which help visualize and monitor model training.

trainstop_criterion = None class-attribute instance-attribute

A function to determine if the training process should stop based on a.

specific stopping metric. If None, training continues until max_iter is reached.

val_epsilon = 1e-05 class-attribute instance-attribute

A small safety margin used to compare the current validation loss with the.

best previous validation loss. This is used to determine improvements in metrics.

val_every = 0 class-attribute instance-attribute

Frequency (in epochs) for performing validation.

If set to 0, validation is not performed. Note that metrics from validation are always written, regardless of the write_every setting. Note that initial validation happens at the start of training (when val_every > 0) For initial validation - initial metrics are written. - checkpoint is saved (when checkpoint_best_only = False)

validation_criterion = None class-attribute instance-attribute

A function to evaluate whether a given validation metric meets a desired condition.

The validation_criterion has the following format: def validation_criterion(val_loss: float, best_val_loss: float, val_epsilon: float) -> bool: # process

If None, no custom validation criterion is applied.

verbose = True class-attribute instance-attribute

Whether to print metrics and status messages during training.

If True, detailed metrics and status updates will be displayed in the console.

write_every = 0 class-attribute instance-attribute

Frequency (in epochs) for writing loss and metrics using the tracking tool during training.

Set to 0 to disable this logging, which prevents metrics from being logged to the tracking tool. Note that the metrics will always be written at the end of training regardless of this setting.

get_parameters(model)

Retrieve all trainable model parameters in a single vector.

PARAMETER DESCRIPTION
model

the input PyTorch model

TYPE: Module

RETURNS DESCRIPTION
Tensor

a 1-dimensional tensor with the parameters

TYPE: Tensor

Source code in perceptrain/parameters.py
def get_parameters(model: Module) -> Tensor:
    """Retrieve all trainable model parameters in a single vector.

    Args:
        model (Module): the input PyTorch model

    Returns:
        Tensor: a 1-dimensional tensor with the parameters
    """
    ps = [p.reshape(-1) for p in model.parameters() if p.requires_grad]
    return torch.concat(ps)

num_parameters(model)

Return the total number of parameters of the given model.

Source code in perceptrain/parameters.py
def num_parameters(model: Module) -> int:
    """Return the total number of parameters of the given model."""
    return len(get_parameters(model))

set_parameters(model, theta)

Set all trainable parameters of a model from a single vector.

Notice that this function assumes prior knowledge of right number of parameters in the model

PARAMETER DESCRIPTION
model

the input PyTorch model

TYPE: Module

theta

the parameters to assign

TYPE: Tensor

Source code in perceptrain/parameters.py
def set_parameters(model: Module, theta: Tensor) -> None:
    """Set all trainable parameters of a model from a single vector.

    Notice that this function assumes prior knowledge of right number
    of parameters in the model

    Args:
        model (Module): the input PyTorch model
        theta (Tensor): the parameters to assign
    """

    with torch.no_grad():
        idx = 0
        for ps in model.parameters():
            if ps.requires_grad:
                n = torch.numel(ps)
                if ps.ndim == 0:
                    ps[()] = theta[idx : idx + n]
                else:
                    ps[:] = theta[idx : idx + n].reshape(ps.size())
                idx += n

Default Torch optimize step with closure.

This is the default optimization step.

PARAMETER DESCRIPTION
model

The input model to be optimized.

TYPE: Module

optimizer

The chosen Torch optimizer.

TYPE: Optimizer

loss_fn

A custom loss function that returns the loss value and a dictionary of metrics.

TYPE: Callable

xs

The input data. If None, it means the given model does not require any input data.

TYPE: dict | list | Tensor | None

device

A target device to run computations on.

TYPE: device DEFAULT: None

dtype

Data type for xs conversion.

TYPE: dtype DEFAULT: None

RETURNS DESCRIPTION
tuple[Tensor | float, dict | None]

tuple[Tensor | float, dict | None]: A tuple containing the computed loss value and a dictionary with collected metrics.

Source code in perceptrain/optimize_step.py
def optimize_step(
    model: Module,
    optimizer: Optimizer,
    loss_fn: Callable,
    xs: dict | list | torch.Tensor | None,
    device: torch.device = None,
    dtype: torch.dtype = None,
) -> tuple[torch.Tensor | float, dict | None]:
    """Default Torch optimize step with closure.

    This is the default optimization step.

    Args:
        model (Module): The input model to be optimized.
        optimizer (Optimizer): The chosen Torch optimizer.
        loss_fn (Callable): A custom loss function
            that returns the loss value and a dictionary of metrics.
        xs (dict | list | Tensor | None): The input data. If None, it means
            the given model does not require any input data.
        device (torch.device): A target device to run computations on.
        dtype (torch.dtype): Data type for `xs` conversion.

    Returns:
        tuple[Tensor | float, dict | None]: A tuple containing the computed loss value
            and a dictionary with collected metrics.
    """

    loss, metrics = None, {}

    def closure() -> Any:
        # NOTE: We need the nonlocal as we can't return a metric dict and
        # because e.g. LBFGS calls this closure multiple times but for some
        # reason the returned loss is always the first one...
        nonlocal metrics, loss
        optimizer.zero_grad()
        loss, metrics = loss_fn(xs, model)
        loss.backward(retain_graph=True)
        return loss.item()

    optimizer.step(closure)
    # return the loss/metrics that are being mutated inside the closure...
    return loss, metrics

DictDataLoader(dataloaders) dataclass

This class only holds a dictionary of DataLoaders and samples from them.

GenerativeIterableDataset(proba_dist)

Bases: IterableDataset

Dataset for sampling from a probability distribution.

Samples once per iteration.

PARAMETER DESCRIPTION
proba_dist

the probability distribution to be sampled.

TYPE: Callable[[], Tensor]

Source code in perceptrain/data.py
def __init__(
    self,
    proba_dist: Callable[[], Tensor],
) -> None:
    """Dataset for sampling from a probability distribution.

    Samples once per iteration.

    Args:
        proba_dist: the probability distribution to be sampled.
    """
    self.proba_dist = proba_dist

InfiniteTensorDataset(*tensors)

Bases: IterableDataset

Randomly sample points from the first dimension of the given tensors.

Behaves like a normal torch Dataset just that we can sample from it as many times as we want.

Examples:

import torch
from perceptrain.data import InfiniteTensorDataset

x_data, y_data = torch.rand(5,2), torch.ones(5,1)
# The dataset accepts any number of tensors with the same batch dimension
ds = InfiniteTensorDataset(x_data, y_data)

# call `next` to get one sample from each tensor:
xs = next(iter(ds))
(tensor([0.1578, 0.1867]), tensor([1.]))

Source code in perceptrain/data.py
def __init__(self, *tensors: Tensor):
    """Randomly sample points from the first dimension of the given tensors.

    Behaves like a normal torch `Dataset` just that we can sample from it as
    many times as we want.

    Examples:
    ```python exec="on" source="above" result="json"
    import torch
    from perceptrain.data import InfiniteTensorDataset

    x_data, y_data = torch.rand(5,2), torch.ones(5,1)
    # The dataset accepts any number of tensors with the same batch dimension
    ds = InfiniteTensorDataset(x_data, y_data)

    # call `next` to get one sample from each tensor:
    xs = next(iter(ds))
    print(str(xs)) # markdown-exec: hide
    ```
    """
    if len(set([t.size(0) for t in tensors])) != 1:
        raise ValueError("Size of first dimension must be the same for all tensors.")
    self.tensors = tensors
    self.indices = list(range(tensors[0].size(0)))

OptimizeResult(iteration, model, optimizer, loss=None, metrics=lambda: dict()(), extra=lambda: dict()(), rank=0, device='cpu') dataclass

OptimizeResult stores many optimization intermediate values.

We store at a current iteration, the model, optimizer, loss values, metrics. An extra dict can be used for saving other information to be used for callbacks.

device = 'cpu' class-attribute instance-attribute

Device on which this result for calculated.

extra = field(default_factory=lambda: dict()) class-attribute instance-attribute

Extra dict for saving anything else to be used in callbacks.

iteration instance-attribute

Current iteration number.

loss = None class-attribute instance-attribute

Loss value.

metrics = field(default_factory=lambda: dict()) class-attribute instance-attribute

Metrics that can be saved during training.

model instance-attribute

Model at iteration.

optimizer instance-attribute

Optimizer at iteration.

rank = 0 class-attribute instance-attribute

Rank of the process for which this result was generated.

R3Dataset(proba_dist, n_samples, release_threshold=0.1)

Bases: Dataset

Dataset for R3 sampling (introduced in https://arxiv.org/abs/2207.02338#).

This is an evolutionary dataset, that updates itself during training, based on the fitness values of the samples. It releases samples if the corresponding fitness value is below the threshold and retains them otherwise. The released samples are replaced by new samples generated from a probability distribution.

While this scheme was originally proposed for training physics-informed neural networks, this implementation can be used for any type of data that can be sampled from a probability distribution.

PARAMETER DESCRIPTION
proba_dist

Probability distribution function for generating features.

TYPE: Callable[[int], Tensor]

n_samples

Number of samples to generate.

TYPE: int

release_threshold

Threshold for releasing samples.

TYPE: float DEFAULT: 0.1

Source code in perceptrain/data.py
def __init__(
    self, proba_dist: Callable[[int], Tensor], n_samples: int, release_threshold: float = 0.1
) -> None:
    """Dataset for R3 sampling (introduced in https://arxiv.org/abs/2207.02338#).

    This is an evolutionary dataset, that updates itself during training, based on the fitness values of the samples.
    It releases samples if the corresponding fitness value is below the threshold and retains them otherwise.
    The released samples are replaced by new samples generated from a probability distribution.

    While this scheme was originally proposed for training physics-informed neural networks,
    this implementation can be used for any type of data that can be sampled from a probability distribution.

    Args:
        proba_dist: Probability distribution function for generating features.
        n_samples: Number of samples to generate.
        release_threshold: Threshold for releasing samples.
    """
    if release_threshold < 0.0:
        raise ValueError("Release threshold must be non-negative.")

    self.proba_dist = proba_dist
    self.n_samples = n_samples
    self.release_threshold = release_threshold

    self.features = proba_dist(n_samples)

    self._released: Tensor | None = None
    self._released_indices: Tensor | None = None
    self._resampled: Tensor | None = None

    self.n_released: int = 0
    self.n_retained: int = 0

update(fitness_values)

Update the dataset by releasing samples below fitness threshold and resampling.

PARAMETER DESCRIPTION
fitness_values

the fitness values of the samples.

TYPE: Tensor

Source code in perceptrain/data.py
def update(self, fitness_values: Tensor) -> None:
    """Update the dataset by releasing samples below fitness threshold and resampling.

    Args:
        fitness_values (Tensor): the fitness values of the samples.
    """
    self._release(fitness_values)
    if self.n_released > 0:
        new_samples = self._resample()

        with torch.no_grad():
            self.features[self._released_indices] = new_samples
    else:
        pass

data_to_device(xs, *args, **kwargs)

Utility method to move arbitrary data to 'device'.

Source code in perceptrain/data.py
@singledispatch
def data_to_device(xs: Any, *args: Any, **kwargs: Any) -> Any:
    """Utility method to move arbitrary data to 'device'."""
    raise ValueError(f"Unable to move {type(xs)} with input args: {args} and kwargs: {kwargs}.")

to_dataloader(*tensors, batch_size=1, infinite=False, collate_fn=None)

Convert torch tensors an (infinite) Dataloader.

PARAMETER DESCRIPTION
*tensors

Torch tensors to use in the dataloader.

TYPE: Tensor DEFAULT: ()

batch_size

batch size of sampled tensors

TYPE: int DEFAULT: 1

infinite

if True, the dataloader will keep sampling indefinitely even after the whole dataset was sampled once

TYPE: bool DEFAULT: False

collate_fn

function to collate the sampled tensors. Passed to torch.utils.data.DataLoader. If None, defaults to torch.utils.data.default_collate.

TYPE: Callable | None DEFAULT: None

Examples:

import torch
from perceptrain import to_dataloader

(x, y, z) = [torch.rand(10) for _ in range(3)]
loader = iter(to_dataloader(x, y, z, batch_size=5, infinite=True))
print(next(loader))
print(next(loader))
print(next(loader))
[tensor([0.3443, 0.8561, 0.1568, 0.7208, 0.0254]), tensor([0.0738, 0.6195, 0.6742, 0.9500, 0.8758]), tensor([0.1853, 0.2132, 0.4902, 0.3378, 0.1490])]
[tensor([0.2239, 0.4870, 0.3889, 0.7024, 0.9865]), tensor([0.0828, 0.4410, 0.6529, 0.2872, 0.1952]), tensor([0.0183, 0.5617, 0.3583, 0.7822, 0.9182])]
[tensor([0.3443, 0.8561, 0.1568, 0.7208, 0.0254]), tensor([0.0738, 0.6195, 0.6742, 0.9500, 0.8758]), tensor([0.1853, 0.2132, 0.4902, 0.3378, 0.1490])]
Source code in perceptrain/data.py
def to_dataloader(
    *tensors: Tensor,
    batch_size: int = 1,
    infinite: bool = False,
    collate_fn: Callable | None = None,
) -> DataLoader:
    """Convert torch tensors an (infinite) Dataloader.

    Arguments:
        *tensors: Torch tensors to use in the dataloader.
        batch_size: batch size of sampled tensors
        infinite: if `True`, the dataloader will keep sampling indefinitely even after the whole
            dataset was sampled once
        collate_fn: function to collate the sampled tensors. Passed to torch.utils.data.DataLoader.
            If None, defaults to torch.utils.data.default_collate.

    Examples:

    ```python exec="on" source="above" result="json"
    import torch
    from perceptrain import to_dataloader

    (x, y, z) = [torch.rand(10) for _ in range(3)]
    loader = iter(to_dataloader(x, y, z, batch_size=5, infinite=True))
    print(next(loader))
    print(next(loader))
    print(next(loader))
    ```
    """
    ds = InfiniteTensorDataset(*tensors) if infinite else TensorDataset(*tensors)
    return DataLoader(ds, batch_size=batch_size, collate_fn=collate_fn)