如何对Pytorch进行“深入”的DEBUG

2023-10-19 10:54:50 浏览数 (1)

前言

我们对Pytorch的debug一般都是在python端进行,这对于一般搭建模型的任务来说足够了。但如果我们需要对Pytorch进行一些修改或者研究一下机器或深度学习系统是如何搭建的,想要深入探索就必须涉及到C 的源码层面。

举个栗子,例如torch.rand(3, 4)这个函数,在Python我们无法通过python端debug进入其内部实现,也无法找到其定义,自然也无法探索其具体的实现细节,所以,为了更好地对Pytorch进行探索和调试,有必要对Pytorch的C 部分进行debug。

准备工作

首先我们需要Pycharm VSCODE(linux端),当然也要有python环境和gdb(这个一般都有),然后创建虚拟环境并编译Pytorch的源码。

既然要对Pytorch的源码进行debug,首先我们需要对Pytorch的源码进行编译。编译时需要修改DEBUG环境变量,编译Debug版的pytorch,命令为DEBUG=1 python setup.py install,更多详细的编译步骤看下面这篇文章,这里不赘述了:

  • Pytorch源码编译简明指南

编译好Pytorch之后,我们用VSCODE打开Pytorch的目录,打开整个工程文件,然后点击左侧的debug图标第一次启动debug,VSCODE会提示我们添加launch.json配置文件,这个文件在.vscode目录下。

然后我们修改launch.json文件,主要是修改program这一栏为python解释器的路径,其他的不用改动:

代码语言:javascript复制
"name": "(gdb) Attach",
"type": "cppdbg",
"request": "attach",
"program": "/home/prototype/anaconda3/envs/pytorch/bin/python", -- 修改这一栏为你执行pytorch的python路径
"processId": "${command:pickProcess}",
"MIMode": "gdb",
"setupCommands": [
    {
        "description": "Enable pretty-printing for gdb",
        "text": "-enable-pretty-printing",
        "ignoreFailures": true
    }
]

开始Debug

我们debug的原理是:

  • 首先运行python代码,获得当前运行的id进程号
  • 其次通过gdb捕获这个进程号进而对C 代码进行debug

因为有一个捕获的过程,为了防止我们捕获前程序已经跑过我们在C 中提前设置的断点,所以需要首先在我们要运行的.py文件中最开始添加一个sleep语句先让程序在空中飞一会,时间自己定我一般是time.sleep(50)。另外提前在你要break的C 代码中设置断点,在VScode中对着你要中断的代码行数点击一下就可以设置。

然后以debug模式运行pytorch代码(在pycharm中点击debug按钮),在console中可以看到此时的进程是28536

点击VSCODE中的debug,这个我们之前已经进行了设置:

此时输入我们之前的进程号,进行attach,注意此时可能系统会要求root权限,输入y确定就好。

等待一会之后(time.sleep的时间),我们就可以看到程序在我们设置的断点处停了下来,此刻我们已经从pytorch的python前端进入了C 后端:

其中debug的信息如下:

代码语言:javascript复制
GNU gdb (Ubuntu 8.2-0ubuntu1~16.04.1) 8.2
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3 : GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word".
Warning: Debuggee TargetArchitecture not detected, assuming x86_64.
=cmd-param-changed,param="pagination",value="off"
[New LWP 15524]
[New LWP 15525]
[New LWP 15528]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007fe9efc605d3 in select () at ../sysdeps/unix/syscall-template.S:84
Loaded '/lib/x86_64-linux-gnu/libpthread.so.0'. Symbols loaded.
Loaded '/lib/x86_64-linux-gnu/libc.so.6'. Symbols loaded.
Loaded '/lib/x86_64-linux-gnu/libdl.so.2'. Symbols loaded.
Loaded '/lib/x86_64-linux-gnu/libutil.so.1'. Symbols loaded.
Loaded '/lib/x86_64-linux-gnu/librt.so.1'. Symbols loaded.
Loaded '/lib/x86_64-linux-gnu/libm.so.6'. Symbols loaded.
Loaded '/lib64/ld-linux-x86-64.so.2'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/lib-dynload/_heapq.cpython-36m-x86_64-linux-gnu.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/lib-dynload/_opcode.cpython-36m-x86_64-linux-gnu.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/lib-dynload/zlib.cpython-36m-x86_64-linux-gnu.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/lib-dynload/../../libz.so.1'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/lib-dynload/_bz2.cpython-36m-x86_64-linux-gnu.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/lib-dynload/_lzma.cpython-36m-x86_64-linux-gnu.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/lib-dynload/../../liblzma.so.5'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/lib-dynload/grp.cpython-36m-x86_64-linux-gnu.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/lib-dynload/_posixsubprocess.cpython-36m-x86_64-linux-gnu.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/lib-dynload/select.cpython-36m-x86_64-linux-gnu.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/lib-dynload/math.cpython-36m-x86_64-linux-gnu.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/lib-dynload/_hashlib.cpython-36m-x86_64-linux-gnu.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/lib-dynload/../../libcrypto.so.1.1'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/lib-dynload/_blake2.cpython-36m-x86_64-linux-gnu.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/lib-dynload/_sha3.cpython-36m-x86_64-linux-gnu.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/lib-dynload/_bisect.cpython-36m-x86_64-linux-gnu.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/lib-dynload/_random.cpython-36m-x86_64-linux-gnu.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/lib-dynload/_struct.cpython-36m-x86_64-linux-gnu.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/lib-dynload/binascii.cpython-36m-x86_64-linux-gnu.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/lib-dynload/_socket.cpython-36m-x86_64-linux-gnu.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/lib-dynload/_datetime.cpython-36m-x86_64-linux-gnu.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/lib-dynload/_ssl.cpython-36m-x86_64-linux-gnu.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/lib-dynload/../../libssl.so.1.1'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/lib-dynload/../../libffi.so.6'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/site-packages/numpy/_mklinit.cpython-36m-x86_64-linux-gnu.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/site-packages/numpy/../../../libmkl_rt.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/site-packages/numpy/../../../libiomp5.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/bin/../lib/libgcc_s.so.1'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/site-packages/numpy/core/_multiarray_umath.cpython-36m-x86_64-linux-gnu.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/lib-dynload/_pickle.cpython-36m-x86_64-linux-gnu.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/site-packages/numpy/core/_multiarray_tests.cpython-36m-x86_64-linux-gnu.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/site-packages/numpy/linalg/lapack_lite.cpython-36m-x86_64-linux-gnu.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/site-packages/numpy/linalg/_umath_linalg.cpython-36m-x86_64-linux-gnu.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/lib-dynload/_decimal.cpython-36m-x86_64-linux-gnu.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/site-packages/numpy/fft/fftpack_lite.cpython-36m-x86_64-linux-gnu.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/site-packages/mkl_fft/_pydfti.cpython-36m-x86_64-linux-gnu.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/site-packages/numpy/random/mtrand.cpython-36m-x86_64-linux-gnu.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/site-packages/numpy/../../../libmkl_core.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/site-packages/numpy/../../../libmkl_intel_thread.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/site-packages/numpy/../../../libmkl_intel_lp64.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/site-packages/numpy/../../../libmkl_avx512.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/site-packages/numpy/../../../libmkl_vml_avx512.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/lib-dynload/_json.cpython-36m-x86_64-linux-gnu.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/lib-dynload/pyexpat.cpython-36m-x86_64-linux-gnu.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/lib-dynload/fcntl.cpython-36m-x86_64-linux-gnu.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/lib-dynload/termios.cpython-36m-x86_64-linux-gnu.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/lib-dynload/resource.cpython-36m-x86_64-linux-gnu.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/lib-dynload/array.cpython-36m-x86_64-linux-gnu.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/lib-dynload/_multiprocessing.cpython-36m-x86_64-linux-gnu.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/lib-dynload/_asyncio.cpython-36m-x86_64-linux-gnu.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/lib-dynload/_sqlite3.cpython-36m-x86_64-linux-gnu.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/lib-dynload/../../libsqlite3.so.0'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/lib-dynload/unicodedata.cpython-36m-x86_64-linux-gnu.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/lib-dynload/_lsprof.cpython-36m-x86_64-linux-gnu.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/lib/libshm.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/lib/libtorch_python.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/libstdc  .so.6'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/lib/libtorch.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/lib/libc10.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/libmkl_gnu_thread.so'. Symbols loaded.
Loaded '/home/prototype/anaconda3/envs/pytorch/lib/libgomp.so.1'. Symbols loaded.
Loaded '/usr/lib/libmpi_cxx.so.1'. Symbols loaded.
Loaded '/usr/lib/libmpi.so.12'. Symbols loaded.
Loaded '/usr/lib/x86_64-linux-gnu/libnuma.so.1'. Symbols loaded.
Loaded '/usr/lib/libibverbs.so.1'. Symbols loaded.
Loaded '/usr/lib/libopen-rte.so.12'. Symbols loaded.
Loaded '/usr/lib/libopen-pal.so.13'. Symbols loaded.
Loaded '/usr/lib/x86_64-linux-gnu/libhwloc.so.5'. Symbols loaded.
Loaded '/usr/lib/x86_64-linux-gnu/libltdl.so.7'. Symbols loaded.
[Switching to thread 4 (Thread 0x7fe9de0e0700 (LWP 15528))](running)
=thread-selected,id="4"

Thread 1 "python" hit Breakpoint 4, torch::autograd::THPVariable_rand (self_=0x0, args=0x7fe9df2cff08, kwargs=0x0) at ../torch/csrc/autograd/generated/python_torch_functions.cpp:8134
8134	  }, /*traceable=*/true);
Execute debugger commands using "-exec <command>", for example "-exec info registers" will list registers in use (when GDB is the debugger)

可以看到我们成功hit Breakpoint 4。此时就可以愉快地用VSCODE调试Pytorch了。

0 人点赞