Accelerator
Accelerator
Accelerator(nprocs=1, compute_setup='auto', log_setup='cpu', backend='gloo', dtype=None)
Bases: Distributor
A class for handling distributed training.
This class extends Distributor
to manage distributed training using PyTorch's
torch.distributed
API. It supports spawning multiple processes and wrapping models with
DistributedDataParallel
(DDP) when required.
This class is provides head level method - distribute() - which wraps a function at a head process level,
before launching nprocs
processes as required. Furthermore, it provides processes level methods,
such as prepare(), and prepare_batch() which can be run inside each process for correct movement and
preparation of model, optimizers and datasets.
Inherited Attributes
nprocs (int): Number of processes to launch for distributed training. execution (BaseExecution): Detected execution instance for process launch (e.g., "torchrun","default"). execution_type (ExecutionType): Type of execution used. rank (int): Global rank of the process (to be set during environment setup). world_size (int): Total number of processes (to be set during environment setup). local_rank (int | None): Local rank on the node (to be set during environment setup). master_addr (str): Master node address (to be set during environment setup). master_port (str): Master node port (to be set during environment setup). node_rank (int): Rank of the node on the cluster setup.
There are three different indicators for number of processes executed.
-
- self._config_nprocs: Number of processes specified by the user. Provided in the initilization of the Accelerator. (acc = Accelerator(nprocs = 2))
-
- self.nprocs: Number of processes defined at the head level.
- When accelerator is used to spawn processes (e.g., In case default, python execution), nprocs = _config_nprocs.
- When an external elastic method is used to spawn processes (e.g., In case of torchrun), nprocs = 1. This is because the external launcher already spawns multiple processes, and the accelerator init is called from each process.
-
- self.world_size: Number of processes actually executed.
Initializes the Accelerator class.
PARAMETER | DESCRIPTION |
---|---|
nprocs
|
Number of processes to launch. Default is 1.
TYPE:
|
compute_setup
|
Compute device setup; options are "auto" (default), "gpu", or "cpu". - "auto": Uses GPU if available, otherwise CPU. - "gpu": Forces GPU usage, raising an error if no CUDA device is available. - "cpu": Forces CPU usage.
TYPE:
|
log_setup
|
Logging device setup; options are "auto", "cpu" (default). - "auto": Uses same device to log as used for computation. - "cpu": Forces CPU logging.
TYPE:
|
backend
|
The backend for distributed communication. Default is "gloo".
TYPE:
|
dtype
|
Data type for controlling numerical precision. Default is None.
TYPE:
|
Source code in perceptrain/train_utils/accelerator.py
all_reduce_dict(d, op='mean')
Performs an all-reduce operation on a dictionary of tensors, averaging values across all processes.
PARAMETER | DESCRIPTION |
---|---|
d
|
A dictionary where values are tensors to be reduced across processes.
TYPE:
|
op
|
Operation method to all_reduce with. Available options include
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
dict[str, Tensor]
|
dict[str, torch.Tensor]: A dictionary with the reduced tensors, averaged over the world size. |
Source code in perceptrain/train_utils/accelerator.py
broadcast(obj, src)
Broadcasts an object from the source process to all processes.
On non-source processes, this value is ignored.
PARAMETER | DESCRIPTION |
---|---|
obj
|
The object to broadcast on the source process.
TYPE:
|
src
|
The source process rank.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Any
|
The broadcasted object from the source process.
TYPE:
|
Source code in perceptrain/train_utils/accelerator.py
distribute(fun)
Decorator to distribute the fit function across multiple processes.
This function is generic and can work with other methods as well. Weather it is bound or unbound.
When applied to a function (typically a fit function), this decorator
will execute the function in a distributed fashion using torch.multiprocessing.
The number of processes used is determined by self.nprocs
,
and if multiple nodes are involved (self.num_nodes > 1
), the process count is
adjusted accordingly. In single process mode (self.nporcs
is 1), the function
is executed directly in the current process.
After execution, the decorator returns the model stored in instance.model
.
PARAMETER | DESCRIPTION |
---|---|
fun
|
The function to be decorated. This function usually implements a model fitting or training routine.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
callable
|
The wrapped function. When called, it will execute in distributed mode
(if configured) and return the value of
TYPE:
|
Source code in perceptrain/train_utils/accelerator.py
is_class_method(fun, args)
Determines if fun
is a class method or a standalone function.
Frist argument of the args should be: - An object and has dict: making it a class - Has a method named fun: making it a class that has this method.
PARAMETER | DESCRIPTION |
---|---|
fun
|
The function being checked.
TYPE:
|
args
|
The arguments passed to the function.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
bool
|
True if
TYPE:
|
Source code in perceptrain/train_utils/accelerator.py
prepare(*args)
Prepares models, optimizers, and dataloaders for distributed training.
This method iterates over the provided objects and:
- Moves models to the specified device (e.g., GPU or CPU) and casts them to the
desired precision (specified by self.dtype
). It then wraps models in
DistributedDataParallel (DDP) if more than one device is used.
- Passes through optimizers unchanged.
- For dataloaders, it adjusts them to use a distributed sampler (if applicable)
by calling a helper method. Note that only the sampler is prepared; moving the
actual batch data to the device is handled separately during training.
Please use the prepare_batch
method to move the batch to correct device/dtype.
PARAMETER | DESCRIPTION |
---|---|
*args
|
A variable number of objects to be prepared. These can include:
- PyTorch models (
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
tuple[Any, ...]
|
tuple[Any, ...]: A tuple containing the prepared objects, where each object has been modified as needed to support distributed training. |
Source code in perceptrain/train_utils/accelerator.py
prepare_batch(batch)
Moves a batch of data to the target device and casts it to the desired data dtype.
This method is typically called within the optimization step of your training loop. It supports various batch formats: - If the batch is a dictionary, each value is moved individually. - If the batch is a tuple or list, each element is processed and returned as a tuple. - Otherwise, the batch is processed directly.
PARAMETER | DESCRIPTION |
---|---|
batch
|
The batch of data to move to the device. This can be a dict, tuple, list,
or any type compatible with
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Any
|
The batch with all elements moved to
TYPE:
|
Source code in perceptrain/train_utils/accelerator.py
worker(rank, instance, fun, args, kwargs)
Worker function to be executed in each spawned process.
This function is called in every subprocess created by torch.multiprocessing (via mp.spawn). It performs the following tasks: 1. Sets up the accelerator for the given process rank. This typically involves configuring the GPU or other hardware resources for distributed training. 2. If the retrieved method has been decorated (i.e. it has a 'wrapped' attribute), the original, unwrapped function is invoked with the given arguments. Otherwise, the method is called directly.
PARAMETER | DESCRIPTION |
---|---|
rank
|
The rank (or identifier) of the spawned process.
TYPE:
|
instance
|
The object (Trainer) that contains the method to execute.
This object is expected to have an
TYPE:
|
fun
|
The function of the method on the instance to be executed.
TYPE:
|
args
|
Positional arguments to pass to the target method.
TYPE:
|
kwargs
|
Keyword arguments to pass to the target method.
TYPE:
|