目录
一、离线安装python3.6.8
二、依赖离线模块下载
三、爬虫离线模块安装
四、浏览器驱动下载安装
五、验证版本和依赖
一、离线安装python3.6.8
python版本下载地址1:https://www.python.org/downloads/
python版本下载地址2:https://www.python.org/ftp/python/3.6.8/
windows安装版:python-3.6.8-amd64.exe
windows绿色版:python-3.6.8-embed-amd64.zip
windows编译版:Python-3.6.8.tgz
二、依赖离线模块下载
python3.6依赖模块搜索地址:https://pypi.org/search/?c=Programming Language :: Python :: 3.6
python扩展包镜像网:https://www.lfd.uci.edu/~gohlke/pythonlibs/
selenium 中文文档:https://python-selenium-zh.readthedocs.io/zh_CN/latest/
功能 | 模块 | 官方地址 | 安装包链接 |
---|---|---|---|
pip依赖 | setuptools | https://pypi.org/project/setuptools/ | setuptools-51.0.0-py3-none-any.whl |
模块安装工具 | pip | https://pypi.org/project/pip/ | pip-20.3.3-py2.py3-none-any.whl |
requests依赖库 | certifi | https://pypi.org/project/certifi/ | certifi-2020.12.5-py2.py3-none-any.whl |
requests依赖库 | chardet | https://pypi.org/project/chardet/ | chardet-4.0.0-py2.py3-none-any.whl |
requests依赖库 | idna | https://pypi.org/project/idna/ | idna-2.10-py2.py3-none-any.whl |
requests依赖库 | urllib3 | https://pypi.org/project/urllib3/ | urllib3-1.26.2-py2.py3-none-any.whl |
http库 | requests | https://pypi.org/project/requests/ | requests-2.25.1-py2.py3-none-any.whl |
xml解析库 | lxml | https://pypi.org/project/lxml/ | lxml-4.6.2-cp36-cp36m-win_amd64.whl |
浏览器自动化框架 | selenium | https://pypi.org/project/selenium/ | selenium-3.141.0-py2.py3-none-any.whl |
文字识别库 | pytesseract | https://pypi.org/project/pytesseract/ | pytesseract-0.3.7.tar.gz |
tesserocr依赖库 | tesseract | https://pypi.org/project/tesseract/ | tesseract-0.1.3.tar.gz |
图像识别库 | tesserocr | https://pypi.org/project/tesserocr/ https://github.com/simonflueckiger/tesserocr-windows_build/releases | tesserocr-2.5.1.tar.gz tesserocr-2.4.0-cp36-cp36m-win_amd64.whl |
文字识别 | tesseract-ocr | https://digi.bib.uni-mannheim.de/tesseract/ | tesseract-ocr-w64-setup-v4.0.0.20181030.exe |
矩阵数组计算库 | numpy | https://pypi.org/project/numpy/ | numpy-1.19.4-cp36-cp36m-win_amd64.whl |
计算机视觉库 | opencv-python | https://pypi.org/project/opencv-python/ | opencv_python-4.4.0.46-cp36-cp36m-win_amd64.whl |
三、爬虫离线模块安装
1、whl依赖包离线安装
代码语言:javascript复制python -m pip install --upgrade setuptools-51.0.0-py3-none-any.whl
python -m pip install --upgrade pip-20.3.3-py2.py3-none-any.whl
代码语言:javascript复制python -m pip install certifi-2020.12.5-py2.py3-none-any.whl
python -m pip install chardet-4.0.0-py2.py3-none-any.whl
python -m pip install idna-2.10-py2.py3-none-any.whl
python -m pip install urllib3-1.26.2-py2.py3-none-any.whl
python -m pip install requests-2.25.1-py2.py3-none-any.whl
python -m pip install lxml-4.6.2-cp36-cp36m-win_amd64.whl
python -m pip install selenium-3.141.0-py2.py3-none-any.whl
python -m pip install tesserocr-2.4.0-cp36-cp36m-win_amd64.whl
python -m pip install numpy-1.19.4-cp36-cp36m-win_amd64.whl
python -m pip install opencv_python-4.4.0.46-cp36-cp36m-win_amd64.whl
2、tar.gz依赖包离线安装 解压之后 cd 进入目录执行
代码语言:javascript复制python setup.py install
3、tesseract-ocr安装
Python tesserocr的安装教程:https://jingyan.baidu.com/article/6b18230972e3e6fb59e15909.html
(1)安装时选择多语言数据下载
(2)将 Tesseract-OCR 添加到环境变量
(3)安装成功之后需要将 Tesseract-OCR 根目录下的 tessdata 文件夹复制到 Python 根目录下,否则会出现报错
代码语言:javascript复制RuntimeError: Failed to init API, possibly an invalid tessdata path: D:PythonPython36Python368/tessdata/
(4)指定变量 tesseract_cmd 为 安装的 tesseract.exe 文件
代码语言:javascript复制from PIL import Image
import pytesseract
## 引用 pytesseract 模块需要手动修改 tesseract_cmd 变量的依赖地址
pytesseract.pytesseract.tesseract_cmd = r'D:Pythoninstall-dependTesseract-OCRtesseract.exe'
四、浏览器驱动下载安装
模拟浏览器 | 查看版本 | 镜像地址 | 驱动下载 |
---|---|---|---|
谷歌浏览器 | chrome://version/ | http://chromedriver.storage.googleapis.com/index.html http://npm.taobao.org/mirrors/chromedriver | chromedriver_win32.zip |
火狐浏览器 | about:support | https://npm.taobao.org/mirrors/geckodriver https://github.com/mozilla/geckodriver/releases | geckodriver-v0.26.0-win64.zip |
微软浏览器 | edge://version/ | https://developer.microsoft.com/en-us/microsoft-edge/tools/webdriver/ | edgedriver_win64.zip |
opera浏览器 | https://github.com/operasoftware/operachromiumdriver/releases | operadriver_win64.zip | |
IE浏览器 | 设置 - 关于IE | http://selenium-release.storage.googleapis.com/index.html | IEDriverServer_x64_3.9.0.zip |
PhantomJS | https://phantomjs.org/download.html https://bitbucket.org/ariya/phantomjs/downloads | phantomjs-2.1.1-windows.zip |
五、验证版本和依赖
代码语言:javascript复制python -V
// Python 3.6.8
python
import platform
platform.architecture()
// ('64bit', 'WindowsPE')
import requests
import lxml
import seleninm
from PIL import Image
import pytesseract
import tesserocr
import cv2 as cv