版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接:https://blog.csdn.net/weixin_44580977/article/details/102056198
urllib.request模块
代码语言:javascript复制from urllin import request
resp = request.urlopen("http://image.baidu.com/")
print(resp.read().decode())
<!DOCTYPE html> <!--STATUS OK--> <head> <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/> <meta name="description" content="百度图片使用世界前沿的人工智能技术,为用户甄选海量的高清美图,用更流畅、更快捷、更精准的搜索体验,带你去发现多彩的世界。"> <meta http-equiv="X-UA-Compatible" content="IE=Edge"/> <meta name="baidu-site-verification" content="2ltGWMzql9"/> <script>
var bdimgdata = {
logid: '11007013867272265913',
sid: 'dc1c38881068b98784a4a5fc83d5a92f6b2743ee',
wh: window.screen.width 'x' window.screen.height,
sampid: '-1',
protocol: window.location.protocol.replace(':', ''),
spat: 0 '-' ''
}
......
获取内容要用read()方法,因为内容是二进制要解码decode()成字符串
urllib3 库
推荐使用的urllib3库
代码语言:javascript复制import urllib3
http = urllib3.PoolManager();
resp_dat = http.request('GET', "http://image.baidu.com/")
print(resp_dat.data.decode())
实战例程
爬取东方财富网股票信息
代码语言:javascript复制#访问行业板块数据
http = urllib3.PoolManager();
pages = 4
conts = []
for p in range(1,pages 1):
url = "http://nufm.dfcfw.com/EM_Finance2014NumericApplication/JS.aspx?cb=jQuery1124012582582823807198_1554554782636&type=CT&token=4f1862fc3b5e77c150a2b985b12db0fd&sty=FPGBKI&js=({data:[(x)],recordsFiltered:(tot)})&cmd=C._BKHY&st=(ChangePercent)&sr=-1&p=%d"%p
url = "&ps=20&_=1554554783027"
try:
resp_dat = http.request('GET', url)
pattern = re.compile(r'BK(.*?)"')
bk_list = re.findall(pattern,resp_dat.data.decode())
for bk in bk_list:
conts.append(bk)
print(resp_dat.data.decode())
except Exception as e:
print(resp_dat.status)
print(e)
print(conts)
#截取部分内容
df = pd.DataFrame(np.zeros((len(conts), 7)), columns=[u'板块名称', u'BK涨跌幅', u'总市值', u'换手率', u'涨跌家数', u'领涨股票', u'SK涨跌幅'])
for num, bk_dat in enumerate(conts) :
bk_dat = bk_dat.split(',')
df.loc[df.index[num], u'板块名称'] = bk_dat[1]
df.loc[df.index[num], u'BK涨跌幅'] = bk_dat[2]
df.loc[df.index[num], u'总市值'] = bk_dat[3]
df.loc[df.index[num], u'换手率'] = bk_dat[4]
df.loc[df.index[num], u'涨跌家数'] = bk_dat[5]
df.loc[df.index[num], u'领涨股票'] = bk_dat[8]
df.loc[df.index[num], u'SK涨跌幅'] = bk_dat[10]
df.to_csv("table-bk.csv", columns=df.columns, index=True, encoding='gb2312')