scrapy ---- 命令行工具

2018-05-30 16:58:33 浏览数 (1)

help: 查看帮助信息。

代码语言:javascript复制
F:wampwwwscrapy>scrapy --help
Scrapy 1.4.0 - no active project

Usage:
  scrapy <command> [options] [args]

Available commands:
  bench         Run quick benchmark test
  fetch         Fetch a URL using the Scrapy downloader
  genspider     Generate new spider using pre-defined templates
  runspider     Run a self-contained spider (without creating a project)
  settings      Get settings values
  shell         Interactive scraping console
  startproject  Create new project
  version       Print Scrapy version
  view          Open URL in browser, as seen by Scrapy

  [ more ]      More commands available when run from project directory

Use "scrapy <command> -h" to see more info about a command

version: 查看版本信息。查看各组件版本信息可以用“ version -v”命令。

各组件推荐通过pycharm安装,简单快捷。

代码语言:javascript复制
F:wampwwwscrapyexample>scrapy version
Scrapy 1.4.0

F:wampwwwscrapyexample>scrapy version -v
Scrapy    : 1.4.0
lxml      : 4.0.0.0
libxml2   : 2.9.5
cssselect : 1.0.1
parsel    : 1.2.0
w3lib     : 1.18.0
Twisted   : 17.5.0
Python    : 3.6.1 (v3.6.1:69c0db5, Mar 21 2017, 17:54:52) [MSC v.1900 32 bit (Intel)]
pyOpenSSL : 17.3.0 (OpenSSL 1.1.0f  25 May 2017)
Platform  : Windows-10-10.0.14393-SP0

startproject: 创建一个工程。

代码语言:javascript复制
F:wampwwwscrapy>scrapy startproject example
New Scrapy project 'example', using template directory 'C:\Users\***\AppData\Roaming\Python\Python36\site-packages\scrapy\templates\project', created in:
    F:wampwwwscrapyexample

You can start your first spider with:
    cd example
    scrapy genspider example example.com

genspider: 创建一个spider, 一个工程可以有多个spider,但要保证name唯一。

代码语言:javascript复制
F:wampwwwscrapyexample>scrapy genspider baidu www.baidu.com
Created spider 'baidu' using template 'basic' in module:
  example.spiders.baidu

F:wampwwwscrapyexample>scrapy genspider google www.google.com
Created spider 'google' using template 'basic' in module:
  example.spiders.google

list: 用来列出本工程中所有spider。

代码语言:javascript复制
F:wampwwwscrapyexample>scrapy list
baidu
google

view: 此命令会打开浏览器,查看源代码在浏览器中具体显示效果。

代码语言:javascript复制
F:wampwwwscrapyexample>scrapy view https://bangumi.bilibili.com/33/

parse: 在工程中使用固定的parse函数解析某个页面。

代码语言:javascript复制
F:wampwwwscrapyexample>scrapy parse https://bangumi.bilibili.com/33/

shell: 很强大的命令。可以调试数据、获取源代码、筛选信息等。

代码语言:javascript复制
F:wampwwwscrapy>scrapy shell https://bangumi.bilibili.com/33/
.
.
.
[s] Available Scrapy objects:
[s]   scrapy     scrapy module (contains scrapy.Request, scrapy.Selector, etc)
[s]   crawler    <scrapy.crawler.Crawler object at 0x03592CD0>
[s]   item       {}
[s]   request    <GET https://bangumi.bilibili.com/33/>
[s]   response   <200 https://bangumi.bilibili.com/33/>
[s]   settings   <scrapy.settings.Settings object at 0x04E4C0F0>
[s]   spider     <DefaultSpider 'default' at 0x5273150>
[s] Useful shortcuts:
[s]   fetch(url[, redirect=True]) Fetch URL and update local objects (by default, redirects are followed)
[s]   fetch(req)                  Fetch a scrapy.Request and update local objects
[s]   shelp()           Shell help (print this help)
[s]   view(response)    View response in a browser

runspider: 运行自包含的spider。

代码语言:javascript复制
F:wampwwwscrapy>scrapy runspider baidu.py

bench: 执行一个基准测试,常用来检测scrapy是否安装成功。

0 人点赞