基于requests库python爬虫:date header都提交了但是无法登陆
伊谢尔伦
伊谢尔伦 2017-04-17 17:54:20
[Python讨论组]

想用requests模拟登陆学校的教务处网站,然后做一些自动抢课的程序,但是登陆问题都难以解决啊_(:зゝ∠)_,明明已经把Data和Header差不多都写好了

代码如下

# __author__ = ''
# -*- coding: utf-8 -*-

import requests
from time import sleep
with requests.session() as s:
    login_url = "https://cas.gzhu.edu.cn/cas_server/login;jsessionid=4CEEEE1C74B97277272DAAF0A4073B0D?service=https://cas.gzhu.edu.cn:443/shunt/index.jsp"
    Data = {
        "username": "**********",
        "password": "******",
        "captcha": "",
        "execution": "e2s1",
        "warn": "true",
        "_eventId": "submit",
    }

    Header = {
        "Host": "cas.gzhu.edu.cn",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
        "Accept-Language": "zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3",
        "Accept-Encoding": "gzip, deflate, br",
        "Referer": "https://cas.gzhu.edu.cn/cas_server/login?service=https://cas.gzhu.edu.cn:443/shunt/index.jsp",
        "Connection": "keep-alive"
    }

    # 无法登陆
    new = s.post(login_url, data=Data, headers=Header)
    print new.text

好久才发现自己没有登录,对后面网址进行操作时出现了循环重定向的错误,不知道是不是因为没有登录造成的.还有我们网站好多都是临时重定向302,简直是low!!

PS:最近快期末考了还在不务正业,我估计是要挂科了,还望各位大大能教教我哪里错了啊,不然大亏大亏啊T.T


登陆的问题貌似解决的,是lt和JSESSIONID字段要到登陆界面提取,但是现在是到目标网址是会发生

requests.exceptions.TooManyRedirects: Exceeded 30 redirects.

的问题,还有在注释Host之前还有一个错误

requests.exceptions.ConnectionError: HTTPConnectionPool(host='cas.gzhu.edu.cn', port=80): Max retries exceeded with url: /c/portal/login;jsessionid=78B851574390A3C4080A3AD99E18E20D?p_l_id=96998&_58_redirect=%2F (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x026CAB10>: Failed to establish a new connection: [Errno 10061] ',))

我再试试一下好了

新增代码如下


import requests
import re
from time import sleep
with requests.session() as s:
    login_url = "https://cas.gzhu.edu.cn/cas_server/login"
    first = s.get(login_url)
    lt = re.findall(r'name="lt" value="(.*?)"', first.text)
    print lt[0]
    cookie = first.headers["Set-Cookie"]
    # print cookie
    Js = re.findall(r"(JSESSIONID=.*?);", cookie)
    print Js[0]
    s.headers = {
        # "Host": "cas.gzhu.edu.cn",
        "User-Agent": "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:47.0) Gecko/20100101 Firefox/47.0",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
        "Accept-Language": "zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3",
        "Accept-Encoding": "gzip, deflate, br",
        "Referer": "https://cas.gzhu.edu.cn/cas_server/login?service=https://cas.gzhu.edu.cn:443/shunt/index.jsp",
        "Cookie": Js[0],
        "Connection": "keep-alive"
    }
    Data = {
        "username": "",
        "password": "",
        "captcha": "",
        "warn": "true",
        "lt": lt,
        "execution": "e1s1",
        "_eventId": "submit",
    }

    # 登陆
    new = s.post(login_url, data=Data)
    print new.url
伊谢尔伦
伊谢尔伦

小伙看你根骨奇佳,潜力无限,来学PHP伐。

全部回复(2)
迷茫
# coding=utf-8

import requests
from pyquery import PyQuery as Q

session = requests.Session()
session.headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.63 Safari/537.36'
}

login_url = "https://cas.gzhu.edu.cn/cas_server/login"

data = {
    'username': 'XXX',
    'password': 'XXX',
    'submit': '登录'
}

#获取参数
r = session.get(login_url)
for _ in Q(r.text).find('input[type="hidden"]'):
    data[Q(_).attr('name')] = Q(_).val()

#登录1
session.post(login_url, data)

#登录2
session.get('http://202.192.18.182/Login_gzdx.aspx')

r = session.get('http://202.192.18.182/xf_xstyxk.aspx?xh=1506100007')
print r.text
阿神

Header应该把Cookie也加上才有可能成功登录!

热门教程
更多>
最新下载
更多>
网站特效
网站源码
网站素材
前端模板
关于我们 免责申明 举报中心 意见反馈 讲师合作 广告合作 最新更新 English
php中文网:公益在线php培训,帮助PHP学习者快速成长!
关注服务号 技术交流群
PHP中文网订阅号
每天精选资源文章推送
PHP中文网APP
随时随地碎片化学习

Copyright 2014-2025 https://www.php.cn/ All Rights Reserved | php.cn | 湘ICP备2023035733号