最近正方教务处貌似升级了,网上的代码都不好使了。具体原因应该是cookie和验证码不同步。每次模拟登陆新网址时总是objective moved to here.下面是用request模块模拟登陆教务处系统的代码,并抓取课程表。(课程表直接输出来的没有输入Excel也没有美化)
代码一共有60行,注意账号和密码要自己输入。
正方的MIS系统基本上都是http://服务器地址/default2.aspx
验证码地址为http://服务器地址/CheckCode.aspx?
代码:
Python
代码语言:txt复制from lxml import etree
import requests
studentnumber = "*******"
password = "*******"
s = requests.session()
url = "http://jw3.edu.cn/default2.aspx"
response = s.get(url)
selector = etree.HTML(response.content)
__VIEWSTATE = selector.xpath('//*[@id="form1"]/input/@value')[0]
imgUrl = "http://jw3.edu.cn/CheckCode.aspx?"
imgresponse = s.get(imgUrl, stream=True)
image = imgresponse.content
try:
with open('C://Users//dell//desktop//1.jpg' ,"wb") as jpg:
jpg.write(image)
except IOError:
print("IO Errorn")
code = input("验证码:")
data = {
"__VIEWSTATE": __VIEWSTATE,
"txtUserName": studentnumber,
"TextBox2": password,
"txtSecretCode": code,
"Button1": "",
}
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.110 Safari/537.36",
}
response = s.post(url, data=data, headers=headers)
def getInfor(response, xpath):
content = response.content.decode('gb2312') # 网页源码是gb2312要先解码
selector = etree.HTML(content)
infor = selector.xpath(xpath)[0]
return infor
text = getInfor(response, '//*[@id="xhxm"]/text()')
text = text.replace(" ", "")
print("你好 ", text)
kburl = "http://jw3.edu.cn/xskbcx.aspx?xh=" studentnumber "&xm=" text[:-2] "&gnmkdm=N121603"
print(kburl)
headers = {
"Referer": "http://jw3.edu.cn/xs_main.aspx?xh=E21614061",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.110 Safari/537.36",
}
response = s.get(kburl, headers=headers)
html = response.content.decode("gb2312")
print(html)
selector=etree.HTML(html)
content = selector.xpath('//*[@id="Table1"]/tr/td/text()')
for each in content:
print(each)
效果: