【深度学习实验】TensorBoard使用教程【SCALARS、IMAGES、TIME SERIES】

2024-07-30 09:19:53 浏览数 (2)

一、环境

代码语言:javascript复制
conda create -n DL python==3.11
代码语言:javascript复制
conda activate DL
代码语言:javascript复制
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
代码语言:javascript复制
conda install jupyter
代码语言:javascript复制
conda install matplotlib TensorBoard
代码语言:javascript复制
conda install tensorboardX

二、TensorBoard

1. 使用TensorBoardX

  TensorBoardX 是一个可以在PyTorch中使用TensorBoard的第三方库,可以使用它来记录训练过程中的损失、准确率、模型参数直方图等信息,并在TensorBoard中进行可视化展示。

a. 安装TensorBoardX
代码语言:javascript复制
conda install tensorboardX

代码语言:javascript复制
pip install tensorboardX
b. 使用示例

在PyTorch中使用TensorBoardX来记录训练过程中的损失:

代码语言:javascript复制
from tensorboardX import SummaryWriter

# 创建一个SummaryWriter对象,指定记录日志的目录
writer = SummaryWriter('logs')

for epoch in range(num_epochs):
	# 在训练循环中记录损失
	writer.add_scalar('Train/Loss', train_loss, epoch)

# 训练结束后关闭SummaryWriter
writer.close()

2. PyTorch内置的TensorBoard

  从PyTorch 1.2版本开始,PyTorch也增加了内置的TensorBoard支持:可以使用torch.utils.tensorboard.SummaryWriter来记录训练过程中的信息,方法与上面的示例类似。

代码语言:javascript复制
from torch.utils.tensorboard import SummaryWriter

3. 启动TensorBoard服务

  使用下述格式命令来启动TensorBoard(默认端口6006):

代码语言:javascript复制
tensorboard --logdir=path_to_your_logs

例:

代码语言:javascript复制
 tensorboard --logdir=./Norm --port=6005

日志文件保存目录为Norm,TensorBoard将运行在6005端口上

三、实战

代码语言:javascript复制
# Create a SummaryWriter for logging information to TensorBoard
writer = SummaryWriter()

for epoch in range(num_epochs):
    print('Starting epoch {}...'.format(epoch), end=' ')
    
    # Iterate through the data loader
    for i, (images, labels) in enumerate(data_loader):
        step = epoch * len(data_loader)   i   1
        real_images = Variable(images).to(device)
        labels = Variable(labels).to(device)
        generator.train()
        
        d_loss = 0
        # Perform multiple discriminator training steps
        for _ in range(n_critic):
            d_loss = discriminator_train_step(len(real_images), discriminator,
                                              generator, d_optimizer, criterion,
                                              real_images, labels,
                                              device)
        
        # Perform a single generator training step
        g_loss = generator_train_step(batch_size, discriminator, generator, g_optimizer, criterion, device)
        
        # Write the losses to TensorBoard
        writer.add_scalars('scalars', {'g_loss': g_loss, 'd_loss': (d_loss / n_critic)}, step)  
        
        # Display sample images at certain steps
        if step % display_step == 0:
            generator.eval()
            z = Variable(torch.randn(9, 100)).to(device)
            labels = Variable(torch.LongTensor(np.arange(9))).to(device)
            sample_images = generator(z, labels).unsqueeze(1)
            grid = make_grid(sample_images, nrow=3, normalize=True)
            writer.add_image('sample_image', grid, step)
    
    print('Done!')
  • 数据格式:
    • 默认:
    • 重命名
  • 终端输入:
代码语言:javascript复制
 tensorboard --logdir=./Norm

  点击上述链接(浏览器中输入http://localhost:6006),打开TensorBoard的网页界面:

  当使用TensorBoard对深度学习模型进行可视化时,常用的功能包括 Scalars(标量)、Images(图像)和Time Series(时间序列):

1. SCALARS(标量)

  Scalas 在 TensorBoard 中用于呈现训练过程中的标量值,例如损失函数值、准确率、学习率等。

  • 通过 Scalars 功能,可以观察这些标量值随着训练步骤的变化而变化的趋势图;
  • 可以同时对比多个标量,以便分析它们之间的关系和趋势。
找不同
关卡1
关卡2

toggle y-axis log scale(切换 Y 轴对数刻度)

关卡3

Alt Scroll to Zoom(Alt 鼠标滚动以缩放)

关卡4

fit domain to data(说人话就是:缩放后一键复原)

Show data download links

开启下载~

Ignore outliers in chart scaling
Smoothing

曲线平滑:

Horizontal Axis
STEP(迭代次数)
RELATIVE(相对值)
WALL(时间)
Runs

选择要显示的数据(左面方框多选,右面圆圈单选): (对比实验结果)

2. IMAGES(图像)

  Images 功能可用于显示模型生成的图像,以及模型中间层的激活值、过滤器等图片信息。

  • 可以通过 Images 功能观察训练过程中生成的样本图片;
  • 也可以通过可视化中间层的特征图像,从而更好地理解模型的学习过程和特征提取能力。
Show actual image size

显示实际图像尺寸

Brightness adjustment

亮度调节

右侧RESET恢复默认值

Contrast adjustment

对比度调整

Runs

选择

查看不同step

滑动~

3. TIME SERIES

  合并上述内容

四、报错

1. AttributeError: module ‘PIL.Image’ has no attribute ‘ANTIALIAS’

解决方案

在pillow的10.0.0版本中,ANTIALIAS方法被删除了,使用新的方法即可:

代码语言:javascript复制
Image.LANCZOS
Image.Resampling.LANCZOS

2. TypeError: Descriptors cannot be created directly.

代码语言:javascript复制
TypeError: Descriptors cannot be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
 1. Downgrade the protobuf package to 3.20.x or lower.
 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

protobuf的版本太高~

解决方案
代码语言:javascript复制
conda install tensorboard 
代码语言:javascript复制
## Package Plan ##

  environment location: E:Softwareanaconda3envsDL

  added / updated specs:
    - tensorboard


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    werkzeug-2.3.8             |  py311haa95532_0         445 KB  defaults
    ------------------------------------------------------------
                                           Total:         445 KB

The following NEW packages will be INSTALLED:

  protobuf           anaconda/pkgs/main/win-64::protobuf-3.20.3-py311hd77b12b_0
  werkzeug           anaconda/pkgs/main/win-64::werkzeug-2.3.8-py311haa95532_0


Proceed ([y]/n)?

0 人点赞