Profiling and debugging on CUDA devices
For this to work, you'll have to have the right to access perf counters on your machine (for example with sudo or inside a Docker container). See: https://docs.nvidia.com/deeplearning/frameworks/dlprof-user-guide/index.html.
Make sure that your entrypoint is an executable script. That means that it must start with a she-bang on the top line, e.g.
and have execution rightsLastly it's recommended to add
To your script in the beginning enable extra annotations of PyTorch functions.You can then use dlprof to profile.
PYQ_LOG_LEVEL=info QADENCE_LOG_LEVEL=debug dlprof --mode=pytorch examples/backends/differentiable_backend.py
For example to achieve this through Docker we can start a session in the shell, also mounting our local Qadence version and PyQTorch (Both optional, but you probably need to mount your script at least)
$ docker run --rm --gpus=1 --shm-size=1g --ulimit memlock=-1 \
--ulimit stack=67108864 -it -p8000:8000 -v./:/opt/qadence -v ../PyQ:/opt/pyqtorch pytorch:24.02-py3 bash
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
)
After this you should have a shell inside the container and you can
root@2a85826c4e7b:/workspace# cd /opt/qadence/
root@2a85826c4e7b:/opt/qadence# pip3 install -e .
root@2a85826c4e7b:/opt/qadence# pip3 install -e ../pyqtorch/
root@2a85826c4e7b:/opt/qadence# pip3 install nvidia-dlprof[pytorch]
root@2a85826c4e7b:/opt/qadence# PYQ_LOG_LEVEL=debug QADENCE_LOG_LEVEL=debug dlprof --mode=pytorch --nsys_opts="-t cuda,nvtx,cublas,cusparse,cusparse-verbose,cublas-verbose --force-overwrite true" examples/models/quantum_model.py
Where we have --force-overwrite true
to always store the latest profiling result (hence you must rename the file) if you wish to keep several. We add cublas,cusparse,cusparse-verbose,cublas-verbose
do get more details
about the numerical backend pacakges being used.