tensorflow-gpu 的镜像当然运行在 GPU 的母机上了,但是如果容器被调度到没有 GPU 的母机上呢?
代码语言:javascript复制# 导入 tensorflow
# python -c "import tensorflow"
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/tensorflow/__init__.py", line 22, in <module>
from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/__init__.py", line 49, in <module>
from tensorflow.python import pywrap_tensorflow
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 74, in <module>
raise ImportError(msg)
ImportError: Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
from tensorflow.python.pywrap_tensorflow_internal import *
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
_pywrap_tensorflow_internal = swig_import_helper()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory
Failed to load the native TensorFlow runtime.
See https://www.tensorflow.org/install/install_sources#common_installation_problems
for some common reasons and solutions. Include the entire stack trace
above this error message when asking for help.
如果是 tensorflow-gpu 的镜像,正常来说应该是需要 GPU 的,但是有可能用户想要运行在 CPU 上呢?虽然需求是不太合理的,既然使用了 tensorflow-gpu 就应该运行在 GPU 上,不然跑在 CPU 上干啥呢?
目前的调度逻辑,对于此类任务,会被调度到只有 CPU 的机器上,而这些机器不仅没有安装 CUDA 的库,并且也没有使用 nvidia-docker,那么在 import tensorflow
的时候,这类 GPU 的镜像就必然找不到 CUDA 的库,从而报错了。
# 运行这个命令
# LD_DEBUG=libs python -c "import tensorflow"
ib/x86_64:/usr/lib (system search path)
475: trying file=/lib/x86_64-linux-gnu/tls/x86_64/libcuda.so.1
475: trying file=/lib/x86_64-linux-gnu/tls/libcuda.so.1
475: trying file=/lib/x86_64-linux-gnu/x86_64/libcuda.so.1
475: trying file=/lib/x86_64-linux-gnu/libcuda.so.1
475: trying file=/usr/lib/x86_64-linux-gnu/tls/x86_64/libcuda.so.1
475: trying file=/usr/lib/x86_64-linux-gnu/tls/libcuda.so.1
475: trying file=/usr/lib/x86_64-linux-gnu/x86_64/libcuda.so.1
475: trying file=/usr/lib/x86_64-linux-gnu/libcuda.so.1
475: trying file=/lib/tls/x86_64/libcuda.so.1
475: trying file=/lib/tls/libcuda.so.1
475: trying file=/lib/x86_64/libcuda.so.1
475: trying file=/lib/libcuda.so.1
475: trying file=/usr/lib/tls/x86_64/libcuda.so.1
475: trying file=/usr/lib/tls/libcuda.so.1
475: trying file=/usr/lib/x86_64/libcuda.so.1
475: trying file=/usr/lib/libcuda.so.1
475:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/tensorflow/__init__.py", line 22, in <module>
from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/__init__.py", line 49, in <module>
from tensorflow.python import pywrap_tensorflow
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 74, in <module>
raise ImportError(msg)
ImportError: Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
from tensorflow.python.pywrap_tensorflow_internal import *
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
_pywrap_tensorflow_internal = swig_import_helper()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory
可能更合理的做法应该是避免用户使用 GPU 的 tensorflow 的镜像,而又要运行到 CPU 的机器上。