## 在学习爬虫中遇到很多坑,写出来供道友参考
- 出现诸如以下错误
ModuleNotFoundError: No module named 'js2xml'
代码语言:javascript复制 NameError: name 'js2xml' is not defined
代码语言:javascript复制 则可能是库没有导入
- 在将 str 转换为 json JSONDecodeError: Extra data: line 1 column 234701 (char 234700)
则可能是 str 不符合 json 格式
1. 可以用 start 和 end 标示开头结尾,如 str[start, end] ;
2. 可以对 str 进行剪切,使用 strip('symbol') 方法,对首尾存在 symbol 的进行剪切
又或者是存在多重结构,则
One-liner for your problem:
代码语言:javascript复制 data = [json.loads(line) for line in open('tweets.json', 'r')]
。。。存坑
过去一段时间后,再次运行 jupyter notebook,出现错误
错误:
'jupyter' 不是内部或外部命令,也不是可运行的程序
原因及解决:环境变量中添加 D:Users23525Anaconda3Scripts,里面有 jupyter_notebook.exe、pip.exe 等命令
然后又出现如下错误:
Traceback (most recent call last): File "C:ProgramDataAnaconda3Scriptsjupyter-notebook-script.py", line 6, in <module> from notebook.notebookapp import main File "C:ProgramDataAnaconda3libsite-packagesnotebooknotebookapp.py", line 47, in <module> from zmq.eventloop import ioloop File "C:ProgramDataAnaconda3libsite-packageszmq__init__.py", line 47, in <module> from zmq import backend File "C:ProgramDataAnaconda3libsite-packageszmqbackend__init__.py", line 40, in <module> reraise(*exc_info) File "C:ProgramDataAnaconda3libsite-packageszmqutilssixcerpt.py", line 34, in reraise raise value File "C:ProgramDataAnaconda3libsite-packageszmqbackend__init__.py", line 27, in <module> _ns = select_backend(first) File "C:ProgramDataAnaconda3libsite-packageszmqbackendselect.py", line 27, in select_backend mod = __import__(name, fromlist=public_api) File "C:ProgramDataAnaconda3libsite-packageszmqbackendcython__init__.py", line 6, in <module> from . import (constants, error, message, context, ImportError: DLL load failed: 找不到指定的模块。
原因:问题都出现在 zmq 文件夹中,搜索答案需要重新安装 zmq
解决:
代码语言:javascript复制pip uninstall pyzmq
代码语言:javascript复制pip install pyzmq
在 install 时又出现如下错误:
pip is configured with locations that require TLS/SSL, however the ssl module in Python is not available. Collecting pyzmq
Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError("Can't connect to HTTPS URL because the SSL module is not available.")': /simple/pyzmq/
Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError("Can't connect to HTTPS URL because the SSL module is not available.")': /simple/pyzmq/
Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError("Can't connect to HTTPS URL because the SSL module is not available.")': /simple/pyzmq/
Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError("Can't connect to HTTPS URL because the SSL module is not available.")': /simple/pyzmq/
Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError("Can't connect to HTTPS URL because the SSL module is not available.")': /simple/pyzmq/
Could not fetch URL https://pypi.org/simple/pyzmq/: There was a problem confirming the ssl certificate: HTTPSConnectionPool(host='pypi.org', port=443): Max retries exceeded with url: /simple/pyzmq/ (Caused by SSLError("Can't connect to HTTPS URL because the SSL module is not available.")) - skipping
Could not find a version that satisfies the requirement pyzmq (from versions: ) No matching distribution found for pyzmq pip is configured with locations that require TLS/SSL, however the ssl module in Python is not available.
Could not fetch URL https://pypi.org/simple/pip/: There was a problem confirming the ssl certificate: HTTPSConnectionPool(host='pypi.org', port=443): Max retries exceeded with url: /simple/pip/ (Caused by SSLError("Can't connect to HTTPS URL because the SSL module is not available.")) - skipping
原因:
我得到了相同的“SSL模块不可用”错误运行Anaconda附带的原生点(目前为18.1)。在我的例子中,这是一个系统路径问题,我通过将以下目录添加到我的路径变量来解决:
代码语言:javascript复制%Miniconda3_DIR%;%Miniconda3_DIR%Librarymingw-w64bin;%Miniconda3_DIR%Libraryusrbin;%Miniconda3_DIR%Librarybin;%Miniconda3_DIR%Scripts;%Miniconda3_DIR%bin;
在哪里,%Miniconda3_DIR%
应该用你的Miniconda(或Anaconda)安装路径代替。
参考:https://stackoverflow.com/questions/53742171/pip-tls-ssl-however-the-ssl-module-in-python-is-not-available-problem
其实出现一段时间不能运行的程序,重新安装是最简单的操作,但我想要真正得解决问题,让我对世界能多少掌握一点控制权。通过一步步发现问题、解决问题、总结及预防,不正是人类发展的恒在规律吗?希望人类继承和探索之路长明。