文章背景:网络爬虫已经成为自动获取互联网数据的主要方式。Requests模块是Python的第三方模块,能够满足日常的网络请求,而且简单好用。因此,下面对Requests库的使用进行介绍。
1 Request库的7个主要方法
对于网络爬虫而言,主要用到的是get()和head()这两个方法。
2 HTTP协议对资源的操作
3 Request库的7个方法解析
3.1 requests.request()
requests.request(method, url, **kwargs)
- method: 请求方式,对应get/head/post/put/patch/delete/options等7种;
- url: 拟获取页面的url链接;
- **kwargs:控制访问的参数,共13个。
- params: 字典或字节序列,作为参数增加到url中;
- data: 字典、字节序列或文件对象,作为Request的内容;
- json: JSON格式的数据,作为Request的内容;
- headers: 字典,HTTP定制头;
- cookies: 字典或CookieJar,Request中的cookie;
- auth: 元组,支持HTTP认证功能;
- files: 字典类型,传输文件;
- timeout: 设定超时时间,秒为单位;
- proxies: 字典类型,设定访问代理服务器,可以增加登录认证;
- allow_redirects: True/False,默认为True,重定向开关;
- stream : True/False,默认为True,获取内容立即下载开关;
- verify: True/False,默认为True,认证SSL证书开关;
- cert: 本地SSL证书路径。
3.2 requests.get()
requests.get(url, params=None, **kwargs)
3.3 requests.head()
requests.head(url, **kwargs)
3.4 requests.post()
requests.post(url, data=None, json=None, **kwargs)
3.5 requests.put()
requests.put(url, data=None, **kwargs)
3.6 requests.patch()
requests.patch(url, data=None, **kwargs)
3.7 requests.delete()
requests.delete(url, **kwargs)
4 Request and Response Objects
Whenever a call is made to requests.get()
and friends, you are doing two major things.
First, you are constructing a Request
object which will be sent off to a server to request or query some resource.
Second, a Response
object is generated once Requests gets a response back from the server.
The Response
object contains all of the information returned by the server and also contains the Request
object you created originally.
r = requests.get(url)
返回一个包含服务器资源的Response对象。
If we want to access the headers the server sent back to us, we do this:
代码语言:javascript复制r.headers
However, if we want to get the headers we sent the server, we simply access the request, and then the request’s headers:
代码语言:javascript复制r.request.headers
5 Response对象的属性
参考资料:
[1] 中国大学MOOC: Python网络爬虫与信息提取(https://www.icourse163.org/course/BIT-1001870001)
[2] Requests: HTTP for Humans(https://requests.readthedocs.io/en/master/)
[3] python爬虫基础requests库的使用以及参数详解(https://blog.csdn.net/weixin_45887687/article/details/106162634)