手把手教你用Python爬取猫眼电影Top100榜

2023-11-29 17:04:04 浏览数 (2)

1.观察网站

打开猫眼排行榜网站 按下F12后刷新 搜索第一个的名字可以发现

这就是包含前10个电影的json链接:https://m.maoyan.com/asgard/asgardapi/mmdb/movieboard/moviedetail/fixedboard/39.json?ci=1&year=0&term=0&limit=10&offset=0

观察链接发现,其中有limit=10,我们来试下将其改成100看看能不能一次性获取所有电影信息

根据返回的结果可以看到 直接将limit改为100丝毫没有问题

2.开始编写代码

使用requests库来进行get请求

未下载requests库的可以使用pip安装,其中-i 为指定下载地址 我们选择腾讯镜像源

代码语言:javascript复制
pip install -i https://mirrors.cloud.tencent.com/pypi/simple requests
代码语言:javascript复制
import requests
headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36'
}
url = 'https://m.maoyan.com/asgard/asgardapi/mmdb/movieboard/moviedetail/fixedboard/39.json?ci=1&year=0&term=0&limit=100&offset=0'
req = requests.get(url,headers=headers).json()
print(req)

运行后可以看到 数据已经获取成功

当然 只获取到json还是不够的,因为不符合人类的阅读习惯,我们来提取需要的字段 并保存为markdown文件

生成markdown表格

代码语言:javascript复制
def generate_markdown_table(header, data):
    # 生成表头
    table = "| "   " | ".join(header)   " |n"
    # 生成分隔线
    table  = "| "   " | ".join(["---"] * len(header))   " |n"
    # 生成数据行
    for row in data:
        table  = "| "   " | ".join(str(cell) for cell in row)   " |n"
    return table

提取数据并保存为md文件

代码语言:javascript复制
header_list = ["封面", "电影名称", "电影评分", "电影类型", "上映时间", "主演","想看人数"]
data_list = []
for movie in req:
    movie_data = ['![{0}]({1})'.format(movie['nm'], movie['img'].replace('2500x2500', '300x500')), movie['nm'], movie['label']['number'], movie['cat'], movie['pubDesc'], movie['star'],movie['wish']]
    data_list.append(movie_data)

# 生成Markdown表格
markdown_table = generate_markdown_table(header_list, data_list)
with open("maoyantop100.md", "w", encoding='utf8') as file:
    file.write(markdown_table)

打开效果

3.完整代码

代码语言:javascript复制
import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36'
}
url = 'https://m.maoyan.com/asgard/asgardapi/mmdb/movieboard/moviedetail/fixedboard/39.json?ci=1&year=0&term=0&limit=100&offset=0'
req = requests.get(url, headers=headers).json()['data']['movies']
# print(req)

def generate_markdown_table(header, data):
    # 生成表头
    table = "| "   " | ".join(header)   " |n"
    # 生成分隔线
    table  = "| "   " | ".join(["---"] * len(header))   " |n"
    # 生成数据行
    for row in data:
        table  = "| "   " | ".join(str(cell) for cell in row)   " |n"
    return table


header_list = ["封面", "电影名称", "电影评分", "电影类型", "上映时间", "主演","想看人数"]
data_list = []
for movie in req:
    movie_data = ['![{0}]({1})'.format(movie['nm'], movie['img'].replace('2500x2500', '300x500')), movie['nm'], movie['label']['number'], movie['cat'], movie['pubDesc'], movie['star'],movie['wish']]
    data_list.append(movie_data)

# 生成Markdown表格
markdown_table = generate_markdown_table(header_list, data_list)
with open("maoyantop100.md", "w", encoding='utf8') as file:
    file.write(markdown_table)

0 人点赞