Python 30个爬虫案例代码(待续)

2023-05-22 14:36:06 浏览数 (1)

温馨提示:本站所有资料仅供学习交流,严禁用于商业用途,请于24小时内删除

当学习Python爬虫时,需要注意以下几点:

1. 爬虫的合法性:在爬取网站数据时,需要遵守网站的规定和法律法规,不得进行非法爬取和侵犯他人隐私等行为。

2. 爬虫的速度:在爬取网站数据时,需要控制爬虫的速度,避免对网站造成过大的负担。

3. 数据的处理和存储:在爬取网站数据后,需要对数据进行处理和存储,以便后续的分析和使用。

学习Python爬虫可以参考以下资料:

1. Python官方文档:https://docs.python.org/3/library/index.html

2. Python爬虫教程:https://www.runoob.com/python/python-web-scraping.html

3. Scrapy官方文档:https://docs.scrapy.org/en/latest/

4. Beautiful Soup官方文档:https://www.crummy.com/software/BeautifulSoup/bs4/doc/

5. Requests官方文档:https://docs.python-requests.org/en/latest/学习Python的入门手册可以参考以下内容:1. Python入门教程:https://www.runoob.com/python/python-tutorial.html

2. Python基础教程:https://docs.python.org/3/tutorial/index.html

3. Python编程从入门到实践:https://book.douban.com/subject/26829016/

4. Python Cookbook:https://book.douban.com/subject/26829016/

5. Python数据科学手册:https://book.douban.com/subject/30293801/

30个代码示例

1. 爬取天气预报数据

代码语言:javascript复制

import requests
from bs4 import BeautifulSoup

url = ''
response = requests.get(url)
response.encoding = 'utf-8'
soup = BeautifulSoup(response.text, 'html.parser')
weather = soup.find('p', class_='wea').text.strip()
temperature = soup.find('p', class_='tem').text.strip()
print('天气:', weather)
print('温度:', temperature)

# 测试用例
# 预期输出:
# 天气: 晴
# 温度: 22℃ / 9℃

2. 爬取股票数据

代码语言:javascript复制
python
import requests
from bs4 import BeautifulSoup

url = ''
response = requests.get(url)
response.encoding = 'utf-8'
soup = BeautifulSoup(response.text, 'html.parser')
price = soup.find('strong', class_='last').text.strip()
change = soup.find('strong', class_='c-rise').text.strip()
print('股票价格:', price)
print('涨跌幅:', change)

# 测试用例
# 预期输出:
# 股票价格: 1746.00
# 涨跌幅:  0.52%

3. 爬取新闻网站的文章

代码语言:javascript复制
python
import requests
from bs4 import BeautifulSoup

url = ''
response = requests.get(url)
response.encoding = 'utf-8'
soup = BeautifulSoup(response.text, 'html.parser')
news_list = soup.find_all('a', class_='news-item')
for news in news_list:
    title = news.text.strip()
    link = news['href']
    print(title)
    print(link)

4. 爬取电影信息和评分

代码语言:javascript复制
python
import requests
from bs4 import BeautifulSoup

url = ''
response = requests.get(url)
response.encoding = 'utf-8'
soup = BeautifulSoup(response.text, 'html.parser')
movie_list = soup.find_all('div', class_='info')
for movie in movie_list:
    title = movie.find('span', class_='title').text.strip()
    rating = movie.find('span', class_='rating_num').text.strip()
    print(title)
    print(rating)

# 测试用例
# 预期输出:
# 肖申克的救赎
# 9.7
# 霸王别姬
# 9.6
# ...

5. 爬取音乐排行榜

代码语言:javascript复制
python
import requests
from bs4 import BeautifulSoup

url = ''
response = requests.get(url)
response.encoding = 'utf-8'
soup = BeautifulSoup(response.text, 'html.parser')
song_list = soup.find_all('div', class_='ttc')
for song in song_list:
    title = song.find('a').text.strip()
    artist = song.find('span', class_='s-fc8').text.strip()
    print(title)
    print(artist)

# 测试用例
# 预期输出:
# 你的答案
# 你的答案
# ...

6. 爬取网站上的图片

代码语言:javascript复制
python
import requests
from bs4 import BeautifulSoup

url = '
response = requests.get(url)
response.encoding = 'utf-8'
soup = BeautifulSoup(response.text, 'html.parser')
image_list = soup.find_all('img')
for image in image_list:
    src = image['src']
    alt = image['alt']
    print(src)
    print(alt)

# 测试用例
# 预期输出:
# 你的答案
# 你的答案
# ...

0 人点赞