tensorflow学习笔记(十八):Multiple GPUs

2019-05-26 12:30:21 浏览数 (1)

Distribuited tensorflow

Multiple GPUs

如何设置训练系统

(1)每个GPU上都会有model的副本 (2)对模型的参数进行同步更新

抽象名词

  • 计算单个副本inferencegradients 的函数称之为tower,使用tf.name_scope()为tower中的每个op_name加上前缀
  • 使用tf.device('/gpu:0') 来指定tower中op的运算设备

框架:

代码语言:javascript复制
with tf.Graph().as_default(), tf.device('/cpu:0'):
    # Create an optimizer that performs gradient descent.
    opt = tf.train.GradientDescentOptimizer(lr)
    tower_grads=[]
    for i in xrange(FLAGS.num_gpus):
        with tf.device('/gpu:%d' % i):
            with tf.name_scope('%s_%d' % (TOWER_NAME, i)) as scope:
                #这里定义你的模型
                #ops,variables

                #损失函数
                loss = yourloss
                # Reuse variables for the next tower.
                tf.get_variable_scope().reuse_variables()

                # Calculate the gradients for the batch of data on this tower.
                grads = opt.compute_gradients(loss)

                # Keep track of the gradients across all towers.
                tower_grads.append(grads)
    # We must calculate the mean of each gradient. Note that this is the
    # synchronization point across all towers.
    grads = average_gradients(tower_grads)

    # Apply the gradients to adjust the shared variables.
    apply_gradient_op = opt.apply_gradients(grads)

源码地址

0 人点赞