help: 查看帮助信息。
代码语言:javascript复制F:wampwwwscrapy>scrapy --help
Scrapy 1.4.0 - no active project
Usage:
scrapy <command> [options] [args]
Available commands:
bench Run quick benchmark test
fetch Fetch a URL using the Scrapy downloader
genspider Generate new spider using pre-defined templates
runspider Run a self-contained spider (without creating a project)
settings Get settings values
shell Interactive scraping console
startproject Create new project
version Print Scrapy version
view Open URL in browser, as seen by Scrapy
[ more ] More commands available when run from project directory
Use "scrapy <command> -h" to see more info about a command
version: 查看版本信息。查看各组件版本信息可以用“ version -v”命令。
各组件推荐通过pycharm安装,简单快捷。
代码语言:javascript复制F:wampwwwscrapyexample>scrapy version
Scrapy 1.4.0
F:wampwwwscrapyexample>scrapy version -v
Scrapy : 1.4.0
lxml : 4.0.0.0
libxml2 : 2.9.5
cssselect : 1.0.1
parsel : 1.2.0
w3lib : 1.18.0
Twisted : 17.5.0
Python : 3.6.1 (v3.6.1:69c0db5, Mar 21 2017, 17:54:52) [MSC v.1900 32 bit (Intel)]
pyOpenSSL : 17.3.0 (OpenSSL 1.1.0f 25 May 2017)
Platform : Windows-10-10.0.14393-SP0
startproject: 创建一个工程。
代码语言:javascript复制F:wampwwwscrapy>scrapy startproject example
New Scrapy project 'example', using template directory 'C:\Users\***\AppData\Roaming\Python\Python36\site-packages\scrapy\templates\project', created in:
F:wampwwwscrapyexample
You can start your first spider with:
cd example
scrapy genspider example example.com
genspider: 创建一个spider, 一个工程可以有多个spider,但要保证name唯一。
代码语言:javascript复制F:wampwwwscrapyexample>scrapy genspider baidu www.baidu.com
Created spider 'baidu' using template 'basic' in module:
example.spiders.baidu
F:wampwwwscrapyexample>scrapy genspider google www.google.com
Created spider 'google' using template 'basic' in module:
example.spiders.google
list: 用来列出本工程中所有spider。
代码语言:javascript复制F:wampwwwscrapyexample>scrapy list
baidu
google
view: 此命令会打开浏览器,查看源代码在浏览器中具体显示效果。
代码语言:javascript复制F:wampwwwscrapyexample>scrapy view https://bangumi.bilibili.com/33/
parse: 在工程中使用固定的parse函数解析某个页面。
代码语言:javascript复制F:wampwwwscrapyexample>scrapy parse https://bangumi.bilibili.com/33/
shell: 很强大的命令。可以调试数据、获取源代码、筛选信息等。
代码语言:javascript复制F:wampwwwscrapy>scrapy shell https://bangumi.bilibili.com/33/
.
.
.
[s] Available Scrapy objects:
[s] scrapy scrapy module (contains scrapy.Request, scrapy.Selector, etc)
[s] crawler <scrapy.crawler.Crawler object at 0x03592CD0>
[s] item {}
[s] request <GET https://bangumi.bilibili.com/33/>
[s] response <200 https://bangumi.bilibili.com/33/>
[s] settings <scrapy.settings.Settings object at 0x04E4C0F0>
[s] spider <DefaultSpider 'default' at 0x5273150>
[s] Useful shortcuts:
[s] fetch(url[, redirect=True]) Fetch URL and update local objects (by default, redirects are followed)
[s] fetch(req) Fetch a scrapy.Request and update local objects
[s] shelp() Shell help (print this help)
[s] view(response) View response in a browser
runspider: 运行自包含的spider。
代码语言:javascript复制F:wampwwwscrapy>scrapy runspider baidu.py
bench: 执行一个基准测试,常用来检测scrapy是否安装成功。