Scrapy安装
官网 https://scrapy.org/
安装方式
在任意操作系统下,可以使用pip安装Scrapy,例如:
代码语言:javascript复制$ pip install scrapy
为确认Scrapy已安装成功,首先在Python中测试能否导入Scrapy模块:
代码语言:javascript复制>>> import scrapy
>>> scrapy.version_info
(1, 8, 0)
然后,在 shell 中测试能否执行 Scrapy 这条命令:
代码语言:javascript复制(base) λ scrapy
Scrapy 1.8.0 - no active project
Usage:
scrapy <command> [options] [args]
Available commands:
bench Run quick benchmark test
fetch Fetch a URL using the Scrapy downloader
genspider Generate new spider using pre-defined templates
runspider Run a self-contained spider (without creating a project)
settings Get settings values
shell Interactive scraping console
startproject Create new project version
Print Scrapy version
view Open URL in browser, as seen by Scrapy
[ more ] More commands available when run from project directory
Use "scrapy <command> -h" to see more info about a command
通过了以上两项检测,说明Scrapy安装成功了。如上所示,我们安装的是当前最新版本1.8.0
注意:
- 在安装Scrapy的过程中可能会遇到缺少VC 等错误,可以安装缺失模块的离线包
- 成功安装后,在CMD下运行scrapy出现上图不算真正成功,检测真正是否成功使用 scrapy bench 测试,如果没有提示错误,就代表成功安装
具体Scrapy安装流程参考:http://doc.scrapy.org/en/latest/intro/install.html##intro-install-platform-notes 里面有各个平台的安装方法
全局命令
代码语言:javascript复制$ scrapy
Scrapy 1.7.3 - no active project
Usage:
scrapy <command> [options] [args]
Available commands:
bench Run quick benchmark test
## 测试电脑性能。
fetch Fetch a URL using the Scrapy downloader
## 将源代码下载下来并显示出来
genspider Generate new spider using pre-defined templates
## 创建一个新的 spider 文件
runspider Run a self-contained spider (without creating a project)
## 这个和通过crawl启动爬虫不同,scrapy runspider 爬虫文件名称
settings Get settings values
## 获取当前的配置信息
shell Interactive scraping console
## 进入 scrapy 的交互模式
startproject Create new project
## 创建爬虫项目。
version Print Scrapy version
view Open URL in browser, as seen by Scrapy
## 将网页document内容下载下来,并且在浏览器显示出来
[ more ] More commands available when run from project directory
Use "scrapy <command> -h" to see more info about a command
项目命令
- scrapy startproject projectname 创建一个项目
- scrapy genspider spidername domain 创建爬虫。创建好爬虫项目以后,还需要创建爬虫。
- scrapy crawl spidername 运行爬虫。注意该命令运行时所在的目录。