使用urllib2BeautifulSoupCookieJar实现登录.

使用charles抓包找到post的login网址: https://passport.csdn.net/account/login

再来看看post的参数:

哎呀这里的密码竟然是明文..

username、password和_eventId好说,只是lt和execution在哪里获得呢?

我们来看一下紧挨着的GET请求返回的html代码,这里竟然还有注释哇哇:

好,那么现在总结一下 步骤:

  1. https://passport.csdn.net/account/login进行get请求,在html代码中获得lt和execution;
  2. 表单创建
  3. 带上POST表单,进行POST请求

代码如下:

获取 lt 和 execution:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
cookie = cookielib.MozillaCookieJar('cookie.txt') # MozillaCookieJar可保存cookie
cookie_handler = urllib2.HTTPCookieProcessor(cookie)
opener = urllib2.build_opener(cookie_handler)
# prepare for login
response = opener.open('https://passport.csdn.net/account/login')
data = response.read()
lt = ""
execution = ""
bs = BeautifulSoup(response.read(),"lxml")
for input in bs.form.find_all('input'):
if input.get('name') == 'lt':
lt = input.get('value')
if input.get('name') == 'execution':
execution = input.get('value')

表单:

1
2
3
4
5
6
7
post_data = urllib.urlencode({
'username' : 'xxxx',
'password' : 'xxxx',
'lt' : lt,
'execution' : execution,
'_eventId' : 'submit'
})

POST地址:

1
login_url = 'https://passport.csdn.net/account/login'

POST:

1
2
3
4
5
request = urllib2.Request(url = login_url,data=post_data)
try:
response = opener.open(request)
except urllib2.HTTPError as e:
print e.read()

GET其他网址验证:

注意这里的request请求需要带上headers,否则会报403 forbidden.

1
2
3
4
5
headers_data = {
'User-Agent' : 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:42.0) Gecko/20100101 Firefox/42.0'
}
request = urllib2.Request(url = 'http://blog.csdn.net/attach_114',headers=headers_data)
print opener.open(request).read()

完整代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#!/usr/bin/env python
#coding:utf-8
import urllib
import urllib2
import cookielib
from bs4 import BeautifulSoup
from pass_csdn import username,password
cookie = cookielib.MozillaCookieJar('cookie.txt')
cookie_handler = urllib2.HTTPCookieProcessor(cookie)
opener = urllib2.build_opener(cookie_handler)
def login():
# prepare for login
response = opener.open('https://passport.csdn.net/account/login')
lt = ""
execution = ""
bs = BeautifulSoup(response.read(),"lxml")
for input in bs.form.find_all('input'):
if input.get('name') == 'lt':
lt = input.get('value')
if input.get('name') == 'execution':
execution = input.get('value')
post_data = urllib.urlencode({
'username' : username,
'password' : password,
'lt' : lt,
'execution' : execution,
'_eventId' : 'submit'
})
headers_data = {
'User-Agent' : 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:42.0) Gecko/20100101 Firefox/42.0'
}
login_url_with_jsession = 'https://passport.csdn.net/account/login'
request = urllib2.Request(url = login_url_with_jsession,data=post_data)
try:
response = opener.open(request)
except urllib2.HTTPError as e:
print e.read()
if __name__ == '__main__':
login()