一、环境
代码语言:javascript复制conda create -n DL python==3.11
代码语言:javascript复制conda activate DL
代码语言:javascript复制conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
代码语言:javascript复制conda install jupyter
代码语言:javascript复制conda install matplotlib TensorBoard
代码语言:javascript复制conda install tensorboardX
二、TensorBoard
1. 使用TensorBoardX
TensorBoardX 是一个可以在PyTorch中使用TensorBoard的第三方库,可以使用它来记录训练过程中的损失、准确率、模型参数直方图等信息,并在TensorBoard中进行可视化展示。
a. 安装TensorBoardX
代码语言:javascript复制conda install tensorboardX
或
代码语言:javascript复制pip install tensorboardX
b. 使用示例
在PyTorch中使用TensorBoardX来记录训练过程中的损失:
代码语言:javascript复制from tensorboardX import SummaryWriter
# 创建一个SummaryWriter对象,指定记录日志的目录
writer = SummaryWriter('logs')
for epoch in range(num_epochs):
# 在训练循环中记录损失
writer.add_scalar('Train/Loss', train_loss, epoch)
# 训练结束后关闭SummaryWriter
writer.close()
2. PyTorch内置的TensorBoard
从PyTorch 1.2版本开始,PyTorch也增加了内置的TensorBoard支持:可以使用torch.utils.tensorboard.SummaryWriter
来记录训练过程中的信息,方法与上面的示例类似。
from torch.utils.tensorboard import SummaryWriter
3. 启动TensorBoard服务
使用下述格式命令来启动TensorBoard(默认端口6006):
代码语言:javascript复制tensorboard --logdir=path_to_your_logs
例:
代码语言:javascript复制 tensorboard --logdir=./Norm --port=6005
日志文件保存目录为Norm,TensorBoard将运行在6005端口上
三、实战
代码语言:javascript复制# Create a SummaryWriter for logging information to TensorBoard
writer = SummaryWriter()
for epoch in range(num_epochs):
print('Starting epoch {}...'.format(epoch), end=' ')
# Iterate through the data loader
for i, (images, labels) in enumerate(data_loader):
step = epoch * len(data_loader) i 1
real_images = Variable(images).to(device)
labels = Variable(labels).to(device)
generator.train()
d_loss = 0
# Perform multiple discriminator training steps
for _ in range(n_critic):
d_loss = discriminator_train_step(len(real_images), discriminator,
generator, d_optimizer, criterion,
real_images, labels,
device)
# Perform a single generator training step
g_loss = generator_train_step(batch_size, discriminator, generator, g_optimizer, criterion, device)
# Write the losses to TensorBoard
writer.add_scalars('scalars', {'g_loss': g_loss, 'd_loss': (d_loss / n_critic)}, step)
# Display sample images at certain steps
if step % display_step == 0:
generator.eval()
z = Variable(torch.randn(9, 100)).to(device)
labels = Variable(torch.LongTensor(np.arange(9))).to(device)
sample_images = generator(z, labels).unsqueeze(1)
grid = make_grid(sample_images, nrow=3, normalize=True)
writer.add_image('sample_image', grid, step)
print('Done!')
- 数据格式:
- 默认:
- 重命名
- 终端输入:
tensorboard --logdir=./Norm
点击上述链接(浏览器中输入http://localhost:6006
),打开TensorBoard的网页界面:
当使用TensorBoard对深度学习模型进行可视化时,常用的功能包括 Scalars(标量)、Images(图像)和Time Series(时间序列):
1. SCALARS(标量)
Scalas 在 TensorBoard 中用于呈现训练过程中的标量值,例如损失函数值、准确率、学习率等。
- 通过 Scalars 功能,可以观察这些标量值随着训练步骤的变化而变化的趋势图;
- 可以同时对比多个标量,以便分析它们之间的关系和趋势。
找不同
关卡1
关卡2
toggle y-axis log scale
(切换 Y 轴对数刻度)
关卡3
Alt Scroll to Zoom
(Alt 鼠标滚动以缩放)
关卡4
fit domain to data
(说人话就是:缩放后一键复原)
Show data download links
开启下载~
Ignore outliers in chart scaling
Smoothing
曲线平滑:
Horizontal Axis
STEP(迭代次数)
RELATIVE(相对值)
WALL(时间)
Runs
选择要显示的数据(左面方框
多选,右面圆圈
单选):
(对比实验结果)
2. IMAGES(图像)
Images 功能可用于显示模型生成的图像,以及模型中间层的激活值、过滤器等图片信息。
- 可以通过 Images 功能观察训练过程中生成的样本图片;
- 也可以通过可视化中间层的特征图像,从而更好地理解模型的学习过程和特征提取能力。
Show actual image size
显示实际图像尺寸
Brightness adjustment
亮度调节
右侧RESET
恢复默认值
Contrast adjustment
对比度调整
Runs
选择
查看不同step
滑动~
3. TIME SERIES
合并上述内容
四、报错
1. AttributeError: module ‘PIL.Image’ has no attribute ‘ANTIALIAS’
解决方案
在pillow的10.0.0版本中,ANTIALIAS方法被删除了,使用新的方法即可:
代码语言:javascript复制Image.LANCZOS
Image.Resampling.LANCZOS
2. TypeError: Descriptors cannot be created directly.
代码语言:javascript复制TypeError: Descriptors cannot be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
1. Downgrade the protobuf package to 3.20.x or lower.
2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).
protobuf的版本太高~
解决方案
代码语言:javascript复制conda install tensorboard
代码语言:javascript复制## Package Plan ##
environment location: E:Softwareanaconda3envsDL
added / updated specs:
- tensorboard
The following packages will be downloaded:
package | build
---------------------------|-----------------
werkzeug-2.3.8 | py311haa95532_0 445 KB defaults
------------------------------------------------------------
Total: 445 KB
The following NEW packages will be INSTALLED:
protobuf anaconda/pkgs/main/win-64::protobuf-3.20.3-py311hd77b12b_0
werkzeug anaconda/pkgs/main/win-64::werkzeug-2.3.8-py311haa95532_0
Proceed ([y]/n)?