大家好,又见面了,我是你们的朋友全栈君。
下载目标是堆糖网热门图片,打开网页并下拉发现图片是通过ajax加载的,按F12打开开发者工具选择nerwork并筛选xhr,继续下拉网页找到ajax请求的api,如下图所示
然后就可以构造请求获取包含图片url的json数据,对于网络请求等IO密集型任务,开启进程池可以提高下载速度
代码如下:
代码语言:javascript复制import requests
from requests import exceptions
import re
from multiprocessing import Pool
import os
def get_pic_info():
url = 'https://www.duitang.com/napi/index/hot/?'
for i in range(1000):
params = {
'include_fields': 'top_comments,is_root,source_link,item,buyable,root_id,status,like_count,sender,album',
'limit': '24',
'start': 24 * i,
}
response = requests.get(url, params=params)
json_data = response.json()
pic_list = json_data['data']['object_list']
for pic_ in pic_list:
image = {}
pic_info = pic_['album']
pic_url = pic_info['covers'][0]
image['pic_name'] = re.sub(r'[\/:*?"<>|rn。,.? ] ', '', pic_info['name']) '.' pic_url.split('.')[-1]
image['pic_url'] = pic_url
yield image
def download_pic(image):
if not os.path.exists(f'./img/{image["pic_name"]}'):
try:
resp = requests.get(image['pic_url'])
if resp.status_code == 200:
with open(f'./img/{image["pic_name"]}', 'wb') as f:
f.write(resp.content)
except exceptions:
return None
else:
print(image['pic_name'] ' has already downloaded')
if __name__ == '__main__':
if not os.path.exists('./img'):
os.mkdir('./img')
pool = Pool()
pool.map(download_pic, get_pic_info())
pool.close()
pool.join()
发布者:全栈程序员栈长,转载请注明出处:https://javaforall.cn/155277.html原文链接:https://javaforall.cn