Skip to content

Trainer

Trainer

This is the API for Trainer class.

Trainer(model, optimizer, config, loss_fn='mse', train_dataloader=None, val_dataloader=None, test_dataloader=None, optimize_step=optimize_step, max_batches=None)

Bases: BaseTrainer

Trainer class to manage and execute training, validation, and testing loops for a model (eg.

QNN).

This class handles the overall training process, including: - Managing epochs and steps - Handling data loading and batching - Computing and updating gradients - Logging and monitoring training metrics

ATTRIBUTE DESCRIPTION
current_epoch

The current epoch number.

TYPE: int

global_step

The global step across all epochs.

TYPE: int

Inherited Attributes

use_grad (bool): Indicates if gradients are used for optimization. Default is True.

model (nn.Module): The neural network model. optimizer (optim.Optimizer | NGOptimizer | None): The optimizer for training. config (TrainConfig): The configuration settings for training. train_dataloader (DataLoader | DictDataLoader | None): DataLoader for training data. val_dataloader (DataLoader | DictDataLoader | None): DataLoader for validation data. test_dataloader (DataLoader | DictDataLoader | None): DataLoader for testing data.

optimize_step (Callable): Function for performing an optimization step. loss_fn (Callable): loss function to use.

num_training_batches (int): Number of training batches. num_validation_batches (int): Number of validation batches. num_test_batches (int): Number of test batches.

state (str): Current state in the training process

Default training routine

for epoch in max_iter + 1:
    # Training
    for batch in train_batches:
        train model
    # Validation
    if val_every % epoch == 0:
        for batch in val_batches:
            train model

Notes
  • In case of InfiniteTensorDataset, number of batches = 1.
  • In case of TensorDataset, number of batches are default.
  • Training is run for max_iter + 1 epochs. Epoch 0 logs untrained model.
  • Please look at the CallbackManager initialize_callbacks method to review the default logging behavior.

Examples: IMP: This uses qadence models (QNN), and should be used carefully.

import torch
from torch.optim import SGD
from perceptrain import (
    feature_map,
    hamiltonian_factory,
    hea,
    QNN,
    QuantumCircuit,
    TrainConfig,
    Z,
)
from perceptrain.trainer import Trainer
from perceptrain.optimize_step import optimize_step
from perceptrain import TrainConfig
from perceptrain.data import to_dataloader

# Initialize the model
n_qubits = 2
fm = feature_map(n_qubits)
ansatz = hea(n_qubits=n_qubits, depth=2)
observable = hamiltonian_factory(n_qubits, detuning=Z)
circuit = QuantumCircuit(n_qubits, fm, ansatz)
model = QNN(circuit, observable, backend="pyqtorch", diff_mode="ad")

# Set up the optimizer
optimizer = SGD(model.parameters(), lr=0.001)

# Use TrainConfig for configuring the training process
config = TrainConfig(
    max_iter=100,
    print_every=10,
    write_every=10,
    checkpoint_every=10,
    val_every=10
)

# Create the Trainer instance with TrainConfig
trainer = Trainer(
    model=model,
    optimizer=optimizer,
    config=config,
    loss_fn="mse",
    optimize_step=optimize_step
)

batch_size = 25
x = torch.linspace(0, 1, 32).reshape(-1, 1)
y = torch.sin(x)
train_loader = to_dataloader(x, y, batch_size=batch_size, infinite=True)
val_loader = to_dataloader(x, y, batch_size=batch_size, infinite=False)

# Train the model
model, optimizer = trainer.fit(train_loader, val_loader)

This also supports both gradient based and gradient free optimization. The default support is for gradient based optimization.

Notes:

  • set_use_grad() (class level):This method is used to set the global use_grad flag, controlling whether the trainer uses gradient-based optimization.
    # gradient based
    Trainer.set_use_grad(True)
    
    # gradient free
    Trainer.set_use_grad(False)
    
  • Context Managers (instance level): enable_grad_opt() and disable_grad_opt() are context managers that temporarily switch the optimization mode for specific code blocks. This is useful when you want to mix gradient-based and gradient-free optimization in the same training process.
    # gradient based
    with trainer.enable_grad_opt(optimizer):
        trainer.fit()
    
    # gradient free
    with trainer.disable_grad_opt(ng_optimizer):
        trainer.fit()
    

Examples

Gradient based optimization example Usage:

from torch import optim
optimizer = optim.SGD(model.parameters(), lr=0.01)

Trainer.set_use_grad(True)
trainer = Trainer(
    model=model,
    optimizer=optimizer,
    config=config,
    loss_fn="mse"
)
trainer.fit(train_loader, val_loader)
or
trainer = Trainer(
    model=model,
    config=config,
    loss_fn="mse"
)
with trainer.enable_grad_opt(optimizer):
    trainer.fit(train_loader, val_loader)

Gradient free optimization example Usage:

import nevergrad as ng
from perceptrain.parameters import num_parameters
ng_optimizer = ng.optimizers.NGOpt(
                budget=config.max_iter, parametrization= num_parameters(model)
                )

Trainer.set_use_grad(False)
trainer = Trainer(
    model=model,
    optimizer=ng_optimizer,
    config=config,
    loss_fn="mse"
)
trainer.fit(train_loader, val_loader)
or
import nevergrad as ng
from perceptrain.parameters import num_parameters
ng_optimizer = ng.optimizers.NGOpt(
        budget=config.max_iter, parametrization= num_parameters(model)
        )

trainer = Trainer(
    model=model,
    config=config,
    loss_fn="mse"
)
with trainer.disable_grad_opt(ng_optimizer):
    trainer.fit(train_loader, val_loader)

Initializes the Trainer class.

PARAMETER DESCRIPTION
model

The PyTorch model to train.

TYPE: Module

optimizer

The optimizer for training.

TYPE: Optimizer | Optimizer | None

config

Training configuration object.

TYPE: TrainConfig

loss_fn

Loss function used for training. If not specified, default mse loss will be used.

TYPE: str | Callable DEFAULT: 'mse'

train_dataloader

DataLoader for training data.

TYPE: DataLoader | DictDataLoader | None DEFAULT: None

val_dataloader

DataLoader for validation data.

TYPE: DataLoader | DictDataLoader | None DEFAULT: None

test_dataloader

DataLoader for test data.

TYPE: DataLoader | DictDataLoader | None DEFAULT: None

optimize_step

Function to execute an optimization step.

TYPE: Callable DEFAULT: optimize_step

max_batches

Maximum number of batches to process per epoch. This is only valid in case of finite TensorDataset dataloaders. if max_batches is not None, the maximum number of batches used will be min(max_batches, len(dataloader.dataset)) In case of InfiniteTensorDataset only 1 batch per epoch is used.

TYPE: int | None DEFAULT: None

Source code in perceptrain/trainer.py
def __init__(
    self,
    model: nn.Module,
    optimizer: optim.Optimizer | NGOptimizer | None,
    config: TrainConfig,
    loss_fn: str | Callable = "mse",
    train_dataloader: DataLoader | DictDataLoader | None = None,
    val_dataloader: DataLoader | DictDataLoader | None = None,
    test_dataloader: DataLoader | DictDataLoader | None = None,
    optimize_step: Callable = optimize_step,
    max_batches: int | None = None,
):
    """
    Initializes the Trainer class.

    Args:
        model (nn.Module): The PyTorch model to train.
        optimizer (optim.Optimizer | NGOptimizer | None): The optimizer for training.
        config (TrainConfig): Training configuration object.
        loss_fn (str | Callable ): Loss function used for training.
            If not specified, default mse loss will be used.
        train_dataloader (DataLoader | DictDataLoader |  None): DataLoader for training data.
        val_dataloader (DataLoader | DictDataLoader |  None): DataLoader for validation data.
        test_dataloader (DataLoader | DictDataLoader |  None): DataLoader for test data.
        optimize_step (Callable): Function to execute an optimization step.
        max_batches (int | None): Maximum number of batches to process per epoch.
            This is only valid in case of finite TensorDataset dataloaders.
            if max_batches is not None, the maximum number of batches used will
            be min(max_batches, len(dataloader.dataset))
            In case of InfiniteTensorDataset only 1 batch per epoch is used.
    """
    super().__init__(
        model=model,
        optimizer=optimizer,
        config=config,
        loss_fn=loss_fn,
        optimize_step=optimize_step,
        train_dataloader=train_dataloader,
        val_dataloader=val_dataloader,
        test_dataloader=test_dataloader,
        max_batches=max_batches,
    )
    self.current_epoch: int = 0
    self.global_step: int = 0
    self._stop_training: torch.Tensor = torch.tensor(0, dtype=torch.int)
    self.progress: Progress | None = None

    # Integration with Accelerator:
    self.accelerator = Accelerator(
        backend=config.backend,
        nprocs=config.nprocs,
        compute_setup=config.compute_setup,
        dtype=config.dtype,
        log_setup=config.log_setup,
    )
    # Decorate the unbound Trainer.fit method with accelerator.distribute.
    # We use __get__ to bind the decorated method to the current instance,
    # ensuring that 'self' is passed only once when self.fit is called.
    self.fit = self.accelerator.distribute(Trainer.fit).__get__(self, Trainer)  # type: ignore[method-assign]

build_optimize_result(result)

Builds and stores the optimization result by calculating the average loss and metrics.

Result (or loss_metrics) can have multiple formats: - None Indicates no loss or metrics data is provided. - tuple[torch.Tensor, dict[str, Any]] A single tuple containing the loss tensor and metrics dictionary - at the end of batch. - list[tuple[torch.Tensor, dict[str, Any]]] A list of tuples for multiple batches. - list[list[tuple[torch.Tensor, dict[str, Any]]]] A list of lists of tuples, where each inner list represents metrics across multiple batches within an epoch.

PARAMETER DESCRIPTION
result

(None | tuple[torch.Tensor, dict[Any, Any]] | list[tuple[torch.Tensor, dict[Any, Any]]] | list[list[tuple[torch.Tensor, dict[Any, Any]]]]) The loss and metrics data, which can have multiple formats

TYPE: None | tuple[Tensor, dict[Any, Any]] | list[tuple[Tensor, dict[Any, Any]]] | list[list[tuple[Tensor, dict[Any, Any]]]]

RETURNS DESCRIPTION
None

This method does not return anything. It sets self.opt_result with

TYPE: None

None

the computed average loss and metrics.

Source code in perceptrain/trainer.py
def build_optimize_result(
    self,
    result: (
        None
        | tuple[torch.Tensor, dict[Any, Any]]
        | list[tuple[torch.Tensor, dict[Any, Any]]]
        | list[list[tuple[torch.Tensor, dict[Any, Any]]]]
    ),
) -> None:
    """
    Builds and stores the optimization result by calculating the average loss and metrics.

    Result (or loss_metrics) can have multiple formats:
    - `None` Indicates no loss or metrics data is provided.
    - `tuple[torch.Tensor, dict[str, Any]]` A single tuple containing the loss tensor
        and metrics dictionary - at the end of batch.
    - `list[tuple[torch.Tensor, dict[str, Any]]]` A list of tuples for
        multiple batches.
    - `list[list[tuple[torch.Tensor, dict[str, Any]]]]` A list of lists of tuples,
    where each inner list represents metrics across multiple batches within an epoch.

    Args:
        result: (None |
                tuple[torch.Tensor, dict[Any, Any]] |
                list[tuple[torch.Tensor, dict[Any, Any]]] |
                list[list[tuple[torch.Tensor, dict[Any, Any]]]])
                    The loss and metrics data, which can have multiple formats

    Returns:
        None: This method does not return anything. It sets `self.opt_result` with
        the computed average loss and metrics.
    """
    loss_metrics = result
    if loss_metrics is None:
        loss = None
        metrics: dict[Any, Any] = {}
    elif isinstance(loss_metrics, tuple):
        # Single tuple case
        loss, metrics = loss_metrics
    else:
        last_epoch: list[tuple[torch.Tensor, dict[Any, Any]]] = []
        if isinstance(loss_metrics, list):
            # Check if it's a list of tuples
            if all(isinstance(item, tuple) for item in loss_metrics):
                last_epoch = cast(list[tuple[torch.Tensor, dict[Any, Any]]], loss_metrics)
            # Check if it's a list of lists of tuples
            elif all(isinstance(item, list) for item in loss_metrics):
                last_epoch = cast(
                    list[tuple[torch.Tensor, dict[Any, Any]]],
                    loss_metrics[-1] if loss_metrics else [],
                )
            else:
                raise ValueError(
                    "Invalid format for result: Expected None, tuple, list of tuples,"
                    " or list of lists of tuples."
                )

        if not last_epoch:
            loss, metrics = None, {}
        else:
            # Compute the average loss over the batches
            loss_tensor = torch.stack([loss_batch for loss_batch, _ in last_epoch])
            avg_loss = loss_tensor.mean()

            # Collect and average metrics for all batches
            metric_keys = last_epoch[0][1].keys()
            metrics_stacked: dict = {key: [] for key in metric_keys}

            for _, metrics_batch in last_epoch:
                for key in metric_keys:
                    value = metrics_batch[key]
                    metrics_stacked[key].append(value)

            avg_metrics = {key: torch.stack(metrics_stacked[key]).mean() for key in metric_keys}

            loss, metrics = avg_loss, avg_metrics

    # Store the optimization result
    self.opt_result = OptimizeResult(
        self.current_epoch,
        self.model,
        self.optimizer,
        loss,
        metrics,
        rank=self.accelerator.rank,
        device=self.accelerator.execution.device,
    )

fit(train_dataloader=None, val_dataloader=None)

Fits the model using the specified training configuration.

The dataloaders can be provided to train on new datasets, or the default dataloaders provided in the trainer will be used.

PARAMETER DESCRIPTION
train_dataloader

DataLoader for training data.

TYPE: DataLoader | DictDataLoader | None DEFAULT: None

val_dataloader

DataLoader for validation data.

TYPE: DataLoader | DictDataLoader | None DEFAULT: None

RETURNS DESCRIPTION
tuple[Module, Optimizer]

tuple[nn.Module, optim.Optimizer]: The trained model and optimizer.

Source code in perceptrain/trainer.py
def fit(
    self,
    train_dataloader: DataLoader | DictDataLoader | None = None,
    val_dataloader: DataLoader | DictDataLoader | None = None,
) -> tuple[nn.Module, optim.Optimizer]:
    """
    Fits the model using the specified training configuration.

    The dataloaders can be provided to train on new datasets, or the default dataloaders
    provided in the trainer will be used.

    Args:
        train_dataloader (DataLoader | DictDataLoader |  None): DataLoader for training data.
        val_dataloader (DataLoader | DictDataLoader |  None): DataLoader for validation data.

    Returns:
        tuple[nn.Module, optim.Optimizer]: The trained model and optimizer.
    """
    if train_dataloader is not None:
        self.train_dataloader = train_dataloader
    if val_dataloader is not None:
        self.val_dataloader = val_dataloader

    self._fit_setup()
    self._train()
    self._fit_end()
    self.training_stage = TrainingStage("idle")
    return self.model, self.optimizer

get_ic_grad_bounds(eta, epsilons, variation_multiple=20, dataloader=None)

Calculate the bounds on the gradient norm of the loss using Information Content.

PARAMETER DESCRIPTION
eta

The sensitivity IC.

TYPE: float

epsilons

The epsilons to use for thresholds to for discretization of the finite derivatives.

TYPE: Tensor

variation_multiple

The number of sets of variational parameters to generate per each variational parameter. The number of variational parameters required for the statisctiacal analysis scales linearly with the amount of them present in the model. This is that linear factor.

TYPE: int DEFAULT: 20

dataloader

The dataloader for training data. A new dataloader can be provided, or the dataloader provided in the trinaer will be used. In case no dataloaders are provided at either places, it assumes that the model does not require any input data.

TYPE: DataLoader | DictDataLoader | None DEFAULT: None

RETURNS DESCRIPTION
tuple[float, float, float]

tuple[float, float, float]: The max IC lower bound, max IC upper bound, and sensitivity IC upper bound.

Examples:

import torch
from torch.optim.adam import Adam

from perceptrain.constructors import ObservableConfig
from perceptrain.config import AnsatzConfig, FeatureMapConfig, TrainConfig
from perceptrain.data import to_dataloader
from perceptrain import QNN
from perceptrain.optimize_step import optimize_step
from perceptrain.trainer import Trainer
from perceptrain.operations.primitive import Z

fm_config = FeatureMapConfig(num_features=1)
ansatz_config = AnsatzConfig(depth=4)
obs_config = ObservableConfig(detuning=Z)

qnn = QNN.from_configs(
    register=4,
    obs_config=obs_config,
    fm_config=fm_config,
    ansatz_config=ansatz_config,
)

optimizer = Adam(qnn.parameters(), lr=0.001)

batch_size = 25
x = torch.linspace(0, 1, 32).reshape(-1, 1)
y = torch.sin(x)
train_loader = to_dataloader(x, y, batch_size=batch_size, infinite=True)

train_config = TrainConfig(max_iter=100)

trainer = Trainer(
    model=qnn,
    optimizer=optimizer,
    config=train_config,
    loss_fn="mse",
    train_dataloader=train_loader,
    optimize_step=optimize_step,
)

# Perform exploratory landscape analysis with Information Content
ic_sensitivity_threshold = 1e-4
epsilons = torch.logspace(-2, 2, 10)

max_ic_lower_bound, max_ic_upper_bound, sensitivity_ic_upper_bound = (
    trainer.get_ic_grad_bounds(
        eta=ic_sensitivity_threshold,
        epsilons=epsilons,
    )
)

# Resume training as usual...

trainer.fit(train_loader)
Source code in perceptrain/trainer.py
def get_ic_grad_bounds(
    self,
    eta: float,
    epsilons: torch.Tensor,
    variation_multiple: int = 20,
    dataloader: DataLoader | DictDataLoader | None = None,
) -> tuple[float, float, float]:
    """
    Calculate the bounds on the gradient norm of the loss using Information Content.

    Args:
        eta (float): The sensitivity IC.
        epsilons (torch.Tensor): The epsilons to use for thresholds to for discretization of the
            finite derivatives.
        variation_multiple (int): The number of sets of variational parameters to generate per
            each variational parameter. The number of variational parameters required for the
            statisctiacal analysis scales linearly with the amount of them present in the
            model. This is that linear factor.
        dataloader (DataLoader | DictDataLoader | None): The dataloader for training data. A
            new dataloader can be provided, or the dataloader provided in the trinaer will be
            used. In case no dataloaders are provided at either places, it assumes that the
            model does not require any input data.

    Returns:
        tuple[float, float, float]: The max IC lower bound, max IC upper bound, and sensitivity
            IC upper bound.

    Examples:
        ```python
        import torch
        from torch.optim.adam import Adam

        from perceptrain.constructors import ObservableConfig
        from perceptrain.config import AnsatzConfig, FeatureMapConfig, TrainConfig
        from perceptrain.data import to_dataloader
        from perceptrain import QNN
        from perceptrain.optimize_step import optimize_step
        from perceptrain.trainer import Trainer
        from perceptrain.operations.primitive import Z

        fm_config = FeatureMapConfig(num_features=1)
        ansatz_config = AnsatzConfig(depth=4)
        obs_config = ObservableConfig(detuning=Z)

        qnn = QNN.from_configs(
            register=4,
            obs_config=obs_config,
            fm_config=fm_config,
            ansatz_config=ansatz_config,
        )

        optimizer = Adam(qnn.parameters(), lr=0.001)

        batch_size = 25
        x = torch.linspace(0, 1, 32).reshape(-1, 1)
        y = torch.sin(x)
        train_loader = to_dataloader(x, y, batch_size=batch_size, infinite=True)

        train_config = TrainConfig(max_iter=100)

        trainer = Trainer(
            model=qnn,
            optimizer=optimizer,
            config=train_config,
            loss_fn="mse",
            train_dataloader=train_loader,
            optimize_step=optimize_step,
        )

        # Perform exploratory landscape analysis with Information Content
        ic_sensitivity_threshold = 1e-4
        epsilons = torch.logspace(-2, 2, 10)

        max_ic_lower_bound, max_ic_upper_bound, sensitivity_ic_upper_bound = (
            trainer.get_ic_grad_bounds(
                eta=ic_sensitivity_threshold,
                epsilons=epsilons,
            )
        )

        # Resume training as usual...

        trainer.fit(train_loader)
        ```
    """
    if not self._use_grad:
        logger.warning(
            "Gradient norm bounds are only relevant when using a gradient based optimizer. \
                Currently the trainer is set to use a gradient-free optimizer."
        )

    dataloader = dataloader if dataloader is not None else self.train_dataloader

    batch = next(iter(self._batch_iter(dataloader, num_batches=1)))

    ic = InformationContent(self.model, self.loss_fn, batch, epsilons)

    max_ic_lower_bound, max_ic_upper_bound = ic.get_grad_norm_bounds_max_IC()
    sensitivity_ic_upper_bound = ic.get_grad_norm_bounds_sensitivity_IC(eta)

    return max_ic_lower_bound, max_ic_upper_bound, sensitivity_ic_upper_bound

run_test_batch(batch)

Runs a single test batch.

PARAMETER DESCRIPTION
batch

Batch of data from the DataLoader.

TYPE: tuple[Tensor, ...]

RETURNS DESCRIPTION
tuple[Tensor, dict[str, Any]]

tuple[torch.Tensor, dict[str, Any]]: Loss and metrics for the batch.

Source code in perceptrain/trainer.py
@BaseTrainer.callback("test_batch")
def run_test_batch(
    self, batch: tuple[torch.Tensor, ...]
) -> tuple[torch.Tensor, dict[str, Any]]:
    """
    Runs a single test batch.

    Args:
        batch (tuple[torch.Tensor, ...]): Batch of data from the DataLoader.

    Returns:
        tuple[torch.Tensor, dict[str, Any]]: Loss and metrics for the batch.
    """
    with torch.no_grad():
        loss_metrics = self.loss_fn(batch, self.model)
    return self._modify_batch_end_loss_metrics(loss_metrics)

run_train_batch(batch)

Runs a single training batch, performing optimization.

We use the step function to optimize the model based on use_grad. use_grad = True entails gradient based optimization, for which we use optimize_step function. use_grad = False entails gradient free optimization, for which we use update_ng_parameters function.

PARAMETER DESCRIPTION
batch

Batch of data from the DataLoader.

TYPE: tuple[Tensor, ...]

RETURNS DESCRIPTION
tuple[Tensor, dict[str, Any]]

tuple[torch.Tensor, dict[str, Any]]: Loss and metrics for the batch. tuple of (loss, metrics)

Source code in perceptrain/trainer.py
@BaseTrainer.callback("train_batch")
def run_train_batch(
    self, batch: tuple[torch.Tensor, ...]
) -> tuple[torch.Tensor, dict[str, Any]]:
    """
    Runs a single training batch, performing optimization.

    We use the step function to optimize the model based on use_grad.
        use_grad = True entails gradient based optimization, for which we use
        optimize_step function.
        use_grad = False entails gradient free optimization, for which we use
        update_ng_parameters function.

    Args:
        batch (tuple[torch.Tensor, ...]): Batch of data from the DataLoader.

    Returns:
        tuple[torch.Tensor, dict[str, Any]]: Loss and metrics for the batch.
            tuple of (loss, metrics)
    """

    if self.use_grad:
        # Perform gradient-based optimization
        loss_metrics = self.optimize_step(
            model=self.model,
            optimizer=self.optimizer,
            loss_fn=self.loss_fn,
            xs=batch,
            device=self.accelerator.execution.device,
            dtype=self.accelerator.execution.data_dtype,
        )
    else:
        # Perform optimization using Nevergrad
        loss, metrics, ng_params = update_ng_parameters(
            model=self.model,
            optimizer=self.optimizer,
            loss_fn=self.loss_fn,
            data=batch,
            ng_params=self.ng_params,  # type: ignore[arg-type]
        )
        self.ng_params = ng_params
        loss_metrics = loss, metrics

    return self._modify_batch_end_loss_metrics(loss_metrics)

run_training(dataloader)

Runs the training for a single epoch, iterating over multiple batches.

PARAMETER DESCRIPTION
dataloader

DataLoader for training data.

TYPE: DataLoader

RETURNS DESCRIPTION
list[tuple[Tensor, dict[str, Any]]]

list[tuple[torch.Tensor, dict[str, Any]]]: Loss and metrics for each batch. list -> tuples Training Batches -> (loss, metrics)

Source code in perceptrain/trainer.py
@BaseTrainer.callback("train_epoch")
def run_training(self, dataloader: DataLoader) -> list[tuple[torch.Tensor, dict[str, Any]]]:
    """
    Runs the training for a single epoch, iterating over multiple batches.

    Args:
        dataloader (DataLoader): DataLoader for training data.

    Returns:
        list[tuple[torch.Tensor, dict[str, Any]]]: Loss and metrics for each batch.
            list                  -> tuples
            Training Batches      -> (loss, metrics)
    """
    self.model.train()
    train_epoch_loss_metrics = []

    for batch in self._batch_iter(dataloader, self.num_training_batches):
        self.on_train_batch_start(batch)
        train_batch_loss_metrics = self.run_train_batch(batch)
        if self.config.all_reduce_metrics:
            train_batch_loss_metrics = self._aggregate_result(train_batch_loss_metrics)
        train_epoch_loss_metrics.append(train_batch_loss_metrics)
        self.on_train_batch_end(train_batch_loss_metrics)

    return train_epoch_loss_metrics

run_val_batch(batch)

Runs a single validation batch.

PARAMETER DESCRIPTION
batch

Batch of data from the DataLoader.

TYPE: tuple[Tensor, ...]

RETURNS DESCRIPTION
tuple[Tensor, dict[str, Any]]

tuple[torch.Tensor, dict[str, Any]]: Loss and metrics for the batch.

Source code in perceptrain/trainer.py
@BaseTrainer.callback("val_batch")
def run_val_batch(self, batch: tuple[torch.Tensor, ...]) -> tuple[torch.Tensor, dict[str, Any]]:
    """
    Runs a single validation batch.

    Args:
        batch (tuple[torch.Tensor, ...]): Batch of data from the DataLoader.

    Returns:
        tuple[torch.Tensor, dict[str, Any]]: Loss and metrics for the batch.
    """
    with torch.no_grad():
        loss_metrics = self.loss_fn(batch, self.model)
    return self._modify_batch_end_loss_metrics(loss_metrics)

run_validation(dataloader)

Runs the validation loop for a single epoch, iterating over multiple batches.

PARAMETER DESCRIPTION
dataloader

DataLoader for validation data.

TYPE: DataLoader

RETURNS DESCRIPTION
list[tuple[Tensor, dict[str, Any]]]

list[tuple[torch.Tensor, dict[str, Any]]]: Loss and metrics for each batch. list -> tuples Validation Batches -> (loss, metrics)

Source code in perceptrain/trainer.py
@BaseTrainer.callback("val_epoch")
def run_validation(self, dataloader: DataLoader) -> list[tuple[torch.Tensor, dict[str, Any]]]:
    """
    Runs the validation loop for a single epoch, iterating over multiple batches.

    Args:
        dataloader (DataLoader): DataLoader for validation data.

    Returns:
        list[tuple[torch.Tensor, dict[str, Any]]]: Loss and metrics for each batch.
            list                  -> tuples
            Validation Batches      -> (loss, metrics)
    """
    self.model.eval()
    val_epoch_loss_metrics = []

    for batch in self._batch_iter(dataloader, self.num_validation_batches):
        self.on_val_batch_start(batch)
        val_batch_loss_metrics = self.run_val_batch(batch)
        if self.config.all_reduce_metrics:
            val_batch_loss_metrics = self._aggregate_result(val_batch_loss_metrics)
        val_epoch_loss_metrics.append(val_batch_loss_metrics)
        self.on_val_batch_end(val_batch_loss_metrics)

    return val_epoch_loss_metrics

stop_training()

Helper function to indicate if the training should be stopped.

We all_reduce the indicator across all processes to ensure all processes are stopped.

Notes

self._stop_training indicator indicates if the training should be stopped. 0 is continue. 1 is stop.

Source code in perceptrain/trainer.py
def stop_training(self) -> bool:
    """
    Helper function to indicate if the training should be stopped.

    We all_reduce the indicator across all processes to ensure all processes are stopped.

    Notes:
        self._stop_training indicator indicates if the training should be stopped.
        0 is continue. 1 is stop.
    """
    _stop_training = self.accelerator.all_reduce_dict(
        {"indicator": self._stop_training}, op="max"
    )
    return bool(_stop_training["indicator"] > 0)

test(test_dataloader=None)

Runs the testing loop if a test DataLoader is provided.

if the test_dataloader is not provided, default test_dataloader defined in the Trainer class is used.

PARAMETER DESCRIPTION
test_dataloader

DataLoader for test data.

TYPE: DataLoader DEFAULT: None

RETURNS DESCRIPTION
list[tuple[Tensor, dict[str, Any]]]

list[tuple[torch.Tensor, dict[str, Any]]]: Loss and metrics for each batch. list -> tuples Test Batches -> (loss, metrics)

Source code in perceptrain/trainer.py
def test(self, test_dataloader: DataLoader = None) -> list[tuple[torch.Tensor, dict[str, Any]]]:
    """
    Runs the testing loop if a test DataLoader is provided.

    if the test_dataloader is not provided, default test_dataloader defined
    in the Trainer class is used.

    Args:
        test_dataloader (DataLoader): DataLoader for test data.

    Returns:
        list[tuple[torch.Tensor, dict[str, Any]]]: Loss and metrics for each batch.
            list                    -> tuples
            Test Batches            -> (loss, metrics)
    """
    if test_dataloader is not None:
        self.test_dataloader = test_dataloader

    self.model.eval()
    test_loss_metrics = []

    for batch in self._batch_iter(test_dataloader, self.num_training_batches):
        self.on_test_batch_start(batch)
        loss_metrics = self.run_test_batch(batch)
        test_loss_metrics.append(loss_metrics)
        self.on_test_batch_end(loss_metrics)

    return test_loss_metrics

BaseTrainer(model, optimizer, config, loss_fn='mse', optimize_step=optimize_step, train_dataloader=None, val_dataloader=None, test_dataloader=None, max_batches=None)

Base class for training machine learning models using a given optimizer.

The base class implements contextmanager for gradient based/free optimization, properties, property setters, input validations, callback decorator generator, and empty hooks for different training steps.

This class provides
  • Context managers for enabling/disabling gradient-based optimization
  • Properties for managing models, optimizers, and dataloaders
  • Input validations and a callback decorator generator
  • Config and callback managers using the provided TrainConfig
ATTRIBUTE DESCRIPTION
use_grad

Indicates if gradients are used for optimization. Default is True.

TYPE: bool

model

The neural network model.

TYPE: Module

optimizer

The optimizer for training.

TYPE: Optimizer | Optimizer | None

config

The configuration settings for training.

TYPE: TrainConfig

train_dataloader

DataLoader for training data.

TYPE: Dataloader | DictDataLoader | None

val_dataloader

DataLoader for validation data.

TYPE: Dataloader | DictDataLoader | None

test_dataloader

DataLoader for testing data.

TYPE: Dataloader | DictDataLoader | None

optimize_step

Function for performing an optimization step.

TYPE: Callable

loss_fn

loss function to use. Default loss function used is 'mse'

TYPE: Callable | str ]

num_training_batches

Number of training batches. In case of InfiniteTensorDataset only 1 batch per epoch is used.

TYPE: int

num_validation_batches

Number of validation batches. In case of InfiniteTensorDataset only 1 batch per epoch is used.

TYPE: int

num_test_batches

Number of test batches. In case of InfiniteTensorDataset only 1 batch per epoch is used.

TYPE: int

state

Current state in the training process

TYPE: str

Initializes the BaseTrainer.

PARAMETER DESCRIPTION
model

The model to train.

TYPE: Module

optimizer

The optimizer for training.

TYPE: Optimizer | Optimizer | None

config

The TrainConfig settings for training.

TYPE: TrainConfig

loss_fn

The loss function to use. str input to be specified to use a default loss function. currently supported loss functions: 'mse', 'cross_entropy'. If not specified, default mse loss will be used.

TYPE: str | Callable DEFAULT: 'mse'

train_dataloader

DataLoader for training data. If the model does not need data to evaluate loss, no dataset should be provided.

TYPE: Dataloader | DictDataLoader | None DEFAULT: None

val_dataloader

DataLoader for validation data.

TYPE: Dataloader | DictDataLoader | None DEFAULT: None

test_dataloader

DataLoader for testing data.

TYPE: Dataloader | DictDataLoader | None DEFAULT: None

max_batches

Maximum number of batches to process per epoch. This is only valid in case of finite TensorDataset dataloaders. if max_batches is not None, the maximum number of batches used will be min(max_batches, len(dataloader.dataset)) In case of InfiniteTensorDataset only 1 batch per epoch is used.

TYPE: int | None DEFAULT: None

Source code in perceptrain/train_utils/base_trainer.py
def __init__(
    self,
    model: nn.Module,
    optimizer: optim.Optimizer | NGOptimizer | None,
    config: TrainConfig,
    loss_fn: str | Callable = "mse",
    optimize_step: Callable = optimize_step,
    train_dataloader: DataLoader | DictDataLoader | None = None,
    val_dataloader: DataLoader | DictDataLoader | None = None,
    test_dataloader: DataLoader | DictDataLoader | None = None,
    max_batches: int | None = None,
):
    """
    Initializes the BaseTrainer.

    Args:
        model (nn.Module): The model to train.
        optimizer (optim.Optimizer | NGOptimizer | None): The optimizer
            for training.
        config (TrainConfig): The TrainConfig settings for training.
        loss_fn (str | Callable): The loss function to use.
            str input to be specified to use a default loss function.
            currently supported loss functions: 'mse', 'cross_entropy'.
            If not specified, default mse loss will be used.
        train_dataloader (Dataloader | DictDataLoader | None): DataLoader for training data.
            If the model does not need data to evaluate loss, no dataset
            should be provided.
        val_dataloader (Dataloader | DictDataLoader | None): DataLoader for validation data.
        test_dataloader (Dataloader | DictDataLoader | None): DataLoader for testing data.
        max_batches (int | None): Maximum number of batches to process per epoch.
            This is only valid in case of finite TensorDataset dataloaders.
            if max_batches is not None, the maximum number of batches used will
            be min(max_batches, len(dataloader.dataset))
            In case of InfiniteTensorDataset only 1 batch per epoch is used.
    """
    self._model: nn.Module
    self._optimizer: optim.Optimizer | NGOptimizer | None
    self._config: TrainConfig
    self._train_dataloader: DataLoader | DictDataLoader | None = None
    self._val_dataloader: DataLoader | DictDataLoader | None = None
    self._test_dataloader: DataLoader | DictDataLoader | None = None

    self.config = config
    self.model = model
    self.optimizer = optimizer
    self.max_batches = max_batches

    self.num_training_batches: int
    self.num_validation_batches: int
    self.num_test_batches: int

    self.train_dataloader = train_dataloader
    self.val_dataloader = val_dataloader
    self.test_dataloader = test_dataloader

    self.loss_fn: Callable = get_loss(loss_fn)
    self.optimize_step: Callable = optimize_step
    self.ng_params: ng.p.Array
    self.training_stage: TrainingStage = TrainingStage("idle")

config property writable

Returns the training configuration.

RETURNS DESCRIPTION
TrainConfig

The configuration object.

TYPE: TrainConfig

model property writable

Returns the model if set, otherwise raises an error.

RETURNS DESCRIPTION
Module

nn.Module: The model.

optimizer property writable

Returns the optimizer if set, otherwise raises an error.

RETURNS DESCRIPTION
Optimizer | Optimizer | None

optim.Optimizer | NGOptimizer | None: The optimizer.

test_dataloader property writable

Returns the test DataLoader, validating its type.

RETURNS DESCRIPTION
DataLoader

The DataLoader for testing data.

TYPE: DataLoader

train_dataloader property writable

Returns the training DataLoader, validating its type.

RETURNS DESCRIPTION
DataLoader

The DataLoader for training data.

TYPE: DataLoader

use_grad property writable

Returns the optimization framework for the trainer.

use_grad = True : Gradient based optimization use_grad = False : Gradient free optimization

RETURNS DESCRIPTION
bool

Bool value for using gradient.

TYPE: bool

val_dataloader property writable

Returns the validation DataLoader, validating its type.

RETURNS DESCRIPTION
DataLoader

The DataLoader for validation data.

TYPE: DataLoader

callback(phase) staticmethod

Decorator for executing callbacks before and after a phase.

Phase are different hooks during the training. list of valid phases is defined in Callbacks. We also update the current state of the training process in the callback decorator.

PARAMETER DESCRIPTION
phase

The phase for which the callback is executed (e.g., "train", "train_epoch", "train_batch").

TYPE: str

RETURNS DESCRIPTION
Callable

The decorated function.

TYPE: Callable

Source code in perceptrain/train_utils/base_trainer.py
@staticmethod
def callback(phase: str) -> Callable:
    """
    Decorator for executing callbacks before and after a phase.

    Phase are different hooks during the training. list of valid
    phases is defined in Callbacks.
    We also update the current state of the training process in
    the callback decorator.

    Args:
        phase (str): The phase for which the callback is executed (e.g., "train",
            "train_epoch", "train_batch").

    Returns:
        Callable: The decorated function.
    """

    def decorator(method: Callable) -> Callable:
        def wrapper(self: Any, *args: Any, **kwargs: Any) -> Any:
            start_event = f"{phase}_start"
            end_event = f"{phase}_end"

            self.training_stage = TrainingStage(start_event)
            self.callback_manager.run_callbacks(trainer=self)
            result = method(self, *args, **kwargs)

            self.training_stage = TrainingStage(end_event)
            # build_optimize_result method is defined in the trainer.
            self.build_optimize_result(result)
            self.callback_manager.run_callbacks(trainer=self)

            return result

        return wrapper

    return decorator

disable_grad_opt(optimizer=None)

Context manager to temporarily disable gradient-based optimization.

PARAMETER DESCRIPTION
optimizer

The Nevergrad optimizer to use. If no optimizer is provided, default optimizer for trainer object will be used.

TYPE: Optimizer DEFAULT: None

Source code in perceptrain/train_utils/base_trainer.py
@contextmanager
def disable_grad_opt(self, optimizer: NGOptimizer | None = None) -> Iterator[None]:
    """
    Context manager to temporarily disable gradient-based optimization.

    Args:
        optimizer (NGOptimizer): The Nevergrad optimizer to use.
            If no optimizer is provided, default optimizer for trainer
            object will be used.
    """
    original_mode = self.use_grad
    original_optimizer = self._optimizer
    try:
        self.use_grad = False
        self.callback_manager.use_grad = False
        self.optimizer = optimizer if optimizer else self.optimizer
        yield
    finally:
        self.use_grad = original_mode
        self.callback_manager.use_grad = original_mode
        self.optimizer = original_optimizer

enable_grad_opt(optimizer=None)

Context manager to temporarily enable gradient-based optimization.

PARAMETER DESCRIPTION
optimizer

The PyTorch optimizer to use. If no optimizer is provided, default optimizer for trainer object will be used.

TYPE: Optimizer DEFAULT: None

Source code in perceptrain/train_utils/base_trainer.py
@contextmanager
def enable_grad_opt(self, optimizer: optim.Optimizer | None = None) -> Iterator[None]:
    """
    Context manager to temporarily enable gradient-based optimization.

    Args:
        optimizer (optim.Optimizer): The PyTorch optimizer to use.
            If no optimizer is provided, default optimizer for trainer
            object will be used.
    """
    original_mode = self.use_grad
    original_optimizer = self._optimizer
    try:
        self.use_grad = True
        self.callback_manager.use_grad = True
        self.optimizer = optimizer if optimizer else self.optimizer
        yield
    finally:
        self.use_grad = original_mode
        self.callback_manager.use_grad = original_mode
        self.optimizer = original_optimizer

on_test_batch_end(test_batch_loss_metrics)

Called at the end of each testing batch.

PARAMETER DESCRIPTION
test_batch_loss_metrics

Metrics for the testing batch loss. tuple of (loss, metrics)

TYPE: tuple[Tensor, Any]

Source code in perceptrain/train_utils/base_trainer.py
def on_test_batch_end(self, test_batch_loss_metrics: tuple[torch.Tensor, Any]) -> None:
    """
    Called at the end of each testing batch.

    Args:
        test_batch_loss_metrics: Metrics for the testing batch loss.
            tuple of (loss, metrics)
    """
    pass

on_test_batch_start(batch)

Called at the start of each testing batch.

PARAMETER DESCRIPTION
batch

A batch of data from the DataLoader. Typically a tuple containing input tensors and corresponding target tensors.

TYPE: tuple[Tensor, ...] | None

Source code in perceptrain/train_utils/base_trainer.py
def on_test_batch_start(self, batch: tuple[torch.Tensor, ...] | None) -> None:
    """
    Called at the start of each testing batch.

    Args:
        batch: A batch of data from the DataLoader. Typically a tuple containing
            input tensors and corresponding target tensors.
    """
    pass

on_train_batch_end(train_batch_loss_metrics)

Called at the end of each training batch.

PARAMETER DESCRIPTION
train_batch_loss_metrics

Metrics for the training batch loss. tuple of (loss, metrics)

TYPE: tuple[Tensor, Any]

Source code in perceptrain/train_utils/base_trainer.py
def on_train_batch_end(self, train_batch_loss_metrics: tuple[torch.Tensor, Any]) -> None:
    """
    Called at the end of each training batch.

    Args:
        train_batch_loss_metrics: Metrics for the training batch loss.
            tuple of (loss, metrics)
    """
    pass

on_train_batch_start(batch)

Called at the start of each training batch.

PARAMETER DESCRIPTION
batch

A batch of data from the DataLoader. Typically a tuple containing input tensors and corresponding target tensors.

TYPE: tuple[Tensor, ...] | None

Source code in perceptrain/train_utils/base_trainer.py
def on_train_batch_start(self, batch: tuple[torch.Tensor, ...] | None) -> None:
    """
    Called at the start of each training batch.

    Args:
        batch: A batch of data from the DataLoader. Typically a tuple containing
            input tensors and corresponding target tensors.
    """
    pass

on_train_end(train_losses, val_losses=None)

Called at the end of training.

PARAMETER DESCRIPTION
train_losses

Metrics for the training losses. list -> list -> tuples Epochs -> Training Batches -> (loss, metrics)

TYPE: list[list[tuple[Tensor, Any]]]

val_losses

Metrics for the validation losses. list -> list -> tuples Epochs -> Validation Batches -> (loss, metrics)

TYPE: list[list[tuple[Tensor, Any]]] | None DEFAULT: None

Source code in perceptrain/train_utils/base_trainer.py
def on_train_end(
    self,
    train_losses: list[list[tuple[torch.Tensor, Any]]],
    val_losses: list[list[tuple[torch.Tensor, Any]]] | None = None,
) -> None:
    """
    Called at the end of training.

    Args:
        train_losses (list[list[tuple[torch.Tensor, Any]]]):
            Metrics for the training losses.
            list    -> list                  -> tuples
            Epochs  -> Training Batches      -> (loss, metrics)
        val_losses (list[list[tuple[torch.Tensor, Any]]] | None):
            Metrics for the validation losses.
            list    -> list                  -> tuples
            Epochs  -> Validation Batches    -> (loss, metrics)
    """
    pass

on_train_epoch_end(train_epoch_loss_metrics)

Called at the end of each training epoch.

PARAMETER DESCRIPTION
train_epoch_loss_metrics

Metrics for the training epoch losses. list -> tuples Training Batches -> (loss, metrics)

TYPE: list[tuple[Tensor, Any]]

Source code in perceptrain/train_utils/base_trainer.py
def on_train_epoch_end(self, train_epoch_loss_metrics: list[tuple[torch.Tensor, Any]]) -> None:
    """
    Called at the end of each training epoch.

    Args:
        train_epoch_loss_metrics: Metrics for the training epoch losses.
            list                  -> tuples
            Training Batches      -> (loss, metrics)
    """
    pass

on_train_epoch_start()

Called at the start of each training epoch.

Source code in perceptrain/train_utils/base_trainer.py
def on_train_epoch_start(self) -> None:
    """Called at the start of each training epoch."""
    pass

on_train_start()

Called at the start of training.

Source code in perceptrain/train_utils/base_trainer.py
def on_train_start(self) -> None:
    """Called at the start of training."""
    pass

on_val_batch_end(val_batch_loss_metrics)

Called at the end of each validation batch.

PARAMETER DESCRIPTION
val_batch_loss_metrics

Metrics for the validation batch loss. tuple of (loss, metrics)

TYPE: tuple[Tensor, Any]

Source code in perceptrain/train_utils/base_trainer.py
def on_val_batch_end(self, val_batch_loss_metrics: tuple[torch.Tensor, Any]) -> None:
    """
    Called at the end of each validation batch.

    Args:
        val_batch_loss_metrics: Metrics for the validation batch loss.
            tuple of (loss, metrics)
    """
    pass

on_val_batch_start(batch)

Called at the start of each validation batch.

PARAMETER DESCRIPTION
batch

A batch of data from the DataLoader. Typically a tuple containing input tensors and corresponding target tensors.

TYPE: tuple[Tensor, ...] | None

Source code in perceptrain/train_utils/base_trainer.py
def on_val_batch_start(self, batch: tuple[torch.Tensor, ...] | None) -> None:
    """
    Called at the start of each validation batch.

    Args:
        batch: A batch of data from the DataLoader. Typically a tuple containing
            input tensors and corresponding target tensors.
    """
    pass

on_val_epoch_end(val_epoch_loss_metrics)

Called at the end of each validation epoch.

PARAMETER DESCRIPTION
val_epoch_loss_metrics

Metrics for the validation epoch loss. list -> tuples Validation Batches -> (loss, metrics)

TYPE: list[tuple[Tensor, Any]]

Source code in perceptrain/train_utils/base_trainer.py
def on_val_epoch_end(self, val_epoch_loss_metrics: list[tuple[torch.Tensor, Any]]) -> None:
    """
    Called at the end of each validation epoch.

    Args:
        val_epoch_loss_metrics: Metrics for the validation epoch loss.
            list                    -> tuples
            Validation Batches      -> (loss, metrics)
    """
    pass

on_val_epoch_start()

Called at the start of each validation epoch.

Source code in perceptrain/train_utils/base_trainer.py
def on_val_epoch_start(self) -> None:
    """Called at the start of each validation epoch."""
    pass

set_use_grad(value) classmethod

Sets the global use_grad flag.

PARAMETER DESCRIPTION
value

Whether to use gradient-based optimization.

TYPE: bool

Source code in perceptrain/train_utils/base_trainer.py
@classmethod
def set_use_grad(cls, value: bool) -> None:
    """
    Sets the global use_grad flag.

    Args:
        value (bool): Whether to use gradient-based optimization.
    """
    if not isinstance(value, bool):
        raise TypeError("use_grad must be a boolean value.")
    cls._use_grad = value