有态度地学习
就在上周,公司的XX总过来和我说,小伙子要好好工作,多学点东西,不要就想着hun,工作的时候多动动脑子,提高工作效率,我的内心...
好吧,那就写点东西。刚好公司是做AI交通的,需要汽车标签分类。与我这个前主机厂工程师相比,公司里的人或许对汽车分类并不是很清楚。冥冥之中,自有天意,还是与汽车息息相关。那么作为一名前主机厂工程师和FSCer,就有必要来科普一波。这里来一波学弟学妹们今年的成果,小八帅照一张。恭喜他们在今年取得全国第14名的好成绩。
就如昨日IG夺冠一般,LPL终于有了一个属于自己的S赛冠军,LOL最高的荣誉。车队也是在今年,建队的第八个年头,获得了一个不错的成绩,可谓八年磨一剑。翻过这座山,他们就会听到你们的故事,加油!!!
说到汽车,很多人对国外品牌比起国内品牌,会有更多的了解,谁叫国产汽车不争气。当然现在的自主品牌也在不断缩小与国外品牌的差距,新能源也算是一个突破口,弯道超车不是梦。
所以本次收集的数据,都是自主品牌汽车,外资合资通通不要。首先去中国政府网来看看今年上半年自主品牌汽车的销量,这里面不单单包含乘用车。所以你会发现,新能源汽车老大—比亚迪,客车老大—宇通,商用车老大—北汽福田,以及皮尺部—众泰并不在榜上。前任东家也在榜上,倍感欣慰,熟悉我的朋友应该知道...
新能源汽车与燃油车无非就是动力方面有区别,车型没什么区别,所以就以爬取燃油车为例。这里插一句,其实车企研发一辆新款车型不容易的,一般需要耗资上亿,在研发的时候就会考虑燃油,混动,纯电动三种类型的。现在所谓的电动车,好多都是是车企的热销车型改变动力形式而已(因为电动车不好卖,不能专门去研发一波吧,亏本的生意~)。接下来就一个个来说啦!!!
1. 上汽
上汽的自主品牌有荣威,名爵,大通,五菱,宝骏。上汽算是很多汽车人的奋斗目标,优越的地理位置,相对不错的薪水。不过比起互联网,还是捉襟见肘。
比如来看看下面这些数据。2017年上汽的全年营收是 8579.78 亿,净利润 344.1 亿。2017年腾讯的全年营收是 2377.6 亿元,净利润 715 亿元。上汽利润率 4% ,腾讯利润率 30% ,只能说都是行业巨头,差距咋这么大呢?
上汽荣威
代码语言:javascript复制import os
import requests
from bs4 import BeautifulSoup
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}
url = 'http://www.roewe.com.cn/htmlinclude/header.html'
response = requests.get(url=url, headers=headers)
html = response.text
soup = BeautifulSoup(html, 'html.parser')
Car_Type = ['Car','SUV']
for i in [1, 2]:
folder_path = "F:/Car/SAIC Motor/roewe/" Car_Type[i - 1] "/"
os.makedirs(folder_path)
ul = (soup.find_all(class_='clearfix ul' str(i)))[0]
img = ul.find_all(name='img')
for item in img:
url = 'http://www.roewe.com.cn' item['src']
r = requests.get(url)
picture_name = url.replace('http://www.roewe.com.cn/images/headernav/', '')
with open('F:\Car\SAIC Motor\roewe\' Car_Type[i-1] "\" picture_name, 'wb') as f:
f.write(r.content)
f.close()
print(url)
print('nn')
{ 左右滑动切换图片 }
上汽名爵
代码语言:javascript复制import os
import requests
from bs4 import BeautifulSoup
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}
url = 'http://www.saicmg.com/'
response = requests.get(url=url, headers=headers)
html = response.text
soup = BeautifulSoup(html, 'html.parser')
ul = soup.find_all(class_='se_tu')[0]
img = ul.find_all(class_='img100')[0:6]
folder_path = "F:/Car/SAIC Motor/mg/"
os.makedirs(folder_path)
for item in img:
url = 'http://www.saicmg.com/' item['src']
r = requests.get(url)
picture_name = url.replace('http://www.saicmg.com/images/', '')
with open('F:\Car\SAIC Motor\mg\' picture_name, 'wb') as f:
f.write(r.content)
f.close()
print(url)
{ 左右滑动切换图片 }
上汽大通
代码语言:javascript复制import os
import requests
from bs4 import BeautifulSoup
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}
url = 'https://www.saicmaxus.com/'
response = requests.get(url=url, headers=headers)
html = response.text
soup = BeautifulSoup(html, 'html.parser')
ul = soup.find_all(class_='item show clearfix')
Car_Type = ['MPV', 'SUV', 'PICK UP', 'MPV-1', 'MPV-2']
num = 0
for a in ul[:5]:
num = 1
folder_path = "F:/Car/SAIC Motor/maxus/" Car_Type[num - 1] "/"
os.makedirs(folder_path)
img = a.find_all(name='img')
for item in img:
url = 'https://www.saicmaxus.com/' item['src']
r = requests.get(url)
picture_name = url.replace('https://www.saicmaxus.com//static/series/', '').replace('https://www.saicmaxus.com//uploads/month_1712/20171229075', '')
with open('F:\Car\SAIC Motor\maxus\' Car_Type[num-1] "\" picture_name, 'wb') as f:
f.write(r.content)
f.close()
print(url)
{ 左右滑动切换图片 }
上汽宝骏五菱
代码语言:javascript复制import os
import requests
from bs4 import BeautifulSoup
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}
url = 'https://www.sgmw.com.cn/'
response = requests.get(url=url, headers=headers)
html = response.text
soup = BeautifulSoup(html, 'html.parser')
ul = soup.find_all(class_='det_box')
Car_Type = ['SUV', 'MPV', 'Car', 'Mini-Car']
num = 0
for i in range(len(ul)):
num = 1
folder_path = "F:/Car/SAIC Motor/sgmw/" Car_Type[num - 1] "/"
os.makedirs(folder_path)
p = ul[i]
box = p.find_all(class_='itembox')
for j in range(len(box)):
g = (box[j].find_all(class_='item_img'))[0]
item = (g.find_all(name='img'))[0]
url = 'https://www.sgmw.com.cn/' item['src']
r = requests.get(url)
picture_name = url.replace('https://www.sgmw.com.cn/images/childnav/', '').replace('https://www.sgmw.com.cn/images/', '').replace('https://www.sgmw.com.cn/hy310w/images/310w/', '').replace('510/', '').replace('s3/', '')
with open('F:\Car\SAIC Motor\sgmw\' Car_Type[num - 1] "\" picture_name, 'wb') as f:
f.write(r.content)
f.close()
print(url)
{ 左右滑动切换图片 }
2. 长安
长安作为自主品牌的老大哥,现如今的奇瑞已经濒临出售的局面,这个老大哥又该何去何从呢?
代码语言:javascript复制import os
import re
import requests
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}
url = 'http://www.changan.com.cn/cache/car_json.js'
response = requests.get(url=url, headers=headers)
res = response.text
result = re.findall('"car_model_photo":"(.*?)","car_model_price_name"', res, re.S)
Car_Type = ['Car', 'SUV', 'MPV']
for i in range(3):
folder_path = "F:/Car/CHANGAN/" Car_Type[i] "/"
os.makedirs(folder_path)
for j in range(16):
url = 'http:' result[j].replace('\', '')
r = requests.get(url)
picture_name = url.replace('http://www.changan.com.cn/uploads/car_model_photo/', '')
if j < 9:
with open('F:\Car\CHANGAN\Car\' picture_name, 'wb') as f:
f.write(r.content)
elif j < 15:
with open('F:\Car\CHANGAN\SUV\' picture_name, 'wb') as f:
f.write(r.content)
else:
with open('F:\Car\CHANGAN\MPV\' picture_name, 'wb') as f:
f.write(r.content)
f.close()
print(url)
{ 左右滑动切换图片 }
3. 吉利汽车
吉利自从收购了沃尔沃后,便飞速发展,成为世界500强。一方面,是从沃尔沃学到了很多东西,设计、制造、采购、营销等。另一方面,也是离不开政策的支持,比如你看看它现在可是戴姆勒的第一大股东,吉利哪里来那么多钱,显而易见~
代码语言:javascript复制import os
import requests
from lxml import etree
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}
url = 'http://www.geely.com/?mz_ca=2071413&mz_sp=7D3ws&mz_kw=8398784&mz_sb=1'
response = requests.get(url=url, headers=headers)
res = response.text
html = etree.HTML(response.text)
result = html.xpath('//div[@class="car"]/img/@src')
Car_Type = ['Car', 'SUV']
for i in range(2):
folder_path = "F:/Car/GEELY_AUTO/" Car_Type[i] "/"
os.makedirs(folder_path)
for j in range(17):
url = result[j]
r = requests.get(url)
picture_name = url.replace('https://dm30webimages.geely.com/GeelyOfficial/Files/Car/CarType/', '')
if 0 < j < 4 or 6 < j < 12 or j == 16:
with open('F:\Car\GEELY_AUTO\Car\' picture_name, 'wb') as f:
f.write(r.content)
elif 3 < j < 7 or 11 < j < 16:
with open('F:\Car\GEELY_AUTO\SUV\' picture_name, 'wb') as f:
f.write(r.content)
else:
continue
f.close()
print(url)
未完待续,还有七个品牌,待我喝口水先~