Tensorflow-gpu 运行在 cpu 母机的问题

2020-08-06 10:08:08 浏览数 (1)

tensorflow-gpu 的镜像当然运行在 GPU 的母机上了,但是如果容器被调度到没有 GPU 的母机上呢?

代码语言:javascript复制
# 导入 tensorflow
# python -c "import tensorflow"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/__init__.py", line 22, in <module>
    from tensorflow.python import pywrap_tensorflow  # pylint: disable=unused-import
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/__init__.py", line 49, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 74, in <module>
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory


Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/install_sources#common_installation_problems

for some common reasons and solutions.  Include the entire stack trace
above this error message when asking for help.

如果是 tensorflow-gpu 的镜像,正常来说应该是需要 GPU 的,但是有可能用户想要运行在 CPU 上呢?虽然需求是不太合理的,既然使用了 tensorflow-gpu 就应该运行在 GPU 上,不然跑在 CPU 上干啥呢?

目前的调度逻辑,对于此类任务,会被调度到只有 CPU 的机器上,而这些机器不仅没有安装 CUDA 的库,并且也没有使用 nvidia-docker,那么在 import tensorflow 的时候,这类 GPU 的镜像就必然找不到 CUDA 的库,从而报错了。

代码语言:javascript复制
# 运行这个命令
# LD_DEBUG=libs python -c "import tensorflow"
ib/x86_64:/usr/lib		(system search path)
       475:	  trying file=/lib/x86_64-linux-gnu/tls/x86_64/libcuda.so.1
       475:	  trying file=/lib/x86_64-linux-gnu/tls/libcuda.so.1
       475:	  trying file=/lib/x86_64-linux-gnu/x86_64/libcuda.so.1
       475:	  trying file=/lib/x86_64-linux-gnu/libcuda.so.1
       475:	  trying file=/usr/lib/x86_64-linux-gnu/tls/x86_64/libcuda.so.1
       475:	  trying file=/usr/lib/x86_64-linux-gnu/tls/libcuda.so.1
       475:	  trying file=/usr/lib/x86_64-linux-gnu/x86_64/libcuda.so.1
       475:	  trying file=/usr/lib/x86_64-linux-gnu/libcuda.so.1
       475:	  trying file=/lib/tls/x86_64/libcuda.so.1
       475:	  trying file=/lib/tls/libcuda.so.1
       475:	  trying file=/lib/x86_64/libcuda.so.1
       475:	  trying file=/lib/libcuda.so.1
       475:	  trying file=/usr/lib/tls/x86_64/libcuda.so.1
       475:	  trying file=/usr/lib/tls/libcuda.so.1
       475:	  trying file=/usr/lib/x86_64/libcuda.so.1
       475:	  trying file=/usr/lib/libcuda.so.1
       475:
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/__init__.py", line 22, in <module>
    from tensorflow.python import pywrap_tensorflow  # pylint: disable=unused-import
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/__init__.py", line 49, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 74, in <module>
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory

可能更合理的做法应该是避免用户使用 GPU 的 tensorflow 的镜像,而又要运行到 CPU 的机器上。

0 人点赞