Python的urllib2、cookielib模块
- 没有验证码的情况
比如登录人人网,在前几次的时候不需要输入验证码,可以先将用户名、密码进行urllib.urlencode编码,用cookielib.CookieJar()生成cookie,然后生成opener=urllib2.build_opener(urllib2.HTTPCookieProcessor(cj)),之后登陆页面req = opener.open(login_page, post_data)就可以了;实际示例如下:
import urllib, urllib2, cookielib, re def login_func(): login_page = "http://www.renren.com/ajaxLogin/login" data = {'email': 'your_email', 'password': 'your_password'} post_data = urllib.urlencode(data) cj = cookielib.CookieJar() opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj)) urllib2.install_opener(opener) print u"登录人人网" req = opener.open(login_page, post_data) req = urllib2.urlopen("http://www.renren.com/home") html = req.read() uid = re.search("'ruid':'(d+)'", html).group(1) print u"登陆成功" return uid
import urllib2 import urllib import cookielib data = {"email":"your_mail", "password":"your_passwd"} post_data = urllib.urlencode(data) cj = cookielib.CookieJar() opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj)) req = urllib2.Request("http://www.renren.com/PLogin.do", post_data) content = opener.open(req)
- 如何绕过验证码
可以在手动登录了之后用控制台或Fiddler抓取cookie信息,然后添加至header(有2种方式):
Use the headers argument to the Request constructor, or:
import urllib2 req = urllib2.Request('http://www.example.com/') req.add_header('Referer', 'http://www.python.org/') r = urllib2.urlopen(req)
OpenerDirector automatically adds a User-Agent header to every Request. To change this:
import urllib2 opener = urllib2.build_opener() opener.addheaders = [('User-agent', 'Mozilla/5.0')] opener.open('http://www.example.com/')
Also, remember that a few standard headers (Content-Length, Content-Type and Host) are added when the Request is passed to urlopen() (or OpenerDirector.open()).
后面的操作就和平时差不多了。
参考:
- 20.6. urllib2 — extensible library for opening URLs — Python 2.7.8 documentation
- python: urllib2 how to send cookie with urlopen request – Stack Overflow
其它
- http://cn.bing.com/search?q=python+%E5%B7%B2%E6%9C%89cookie+%E7%99%BB%E5%BD%95
- http://search.aol.com/aol/search?q=Python+existing+cookie+login+web
- http://search.aol.com/aol/search?q=Python+cookieJar
- https://docs.python.org/2/library/urllib2.html
- https://docs.python.org/2/library/cookielib.html
- http://stackoverflow.com/questions/6878418/putting-a-cookie-in-a-cookiejar
- http://stackoverflow.com/questions/2169281/how-to-add-cookie-to-existing-cookielib-cookiejar-instance-in-python
总结下来就是:
因为cookie就是HTTP Header的一部分,所以直接添加到Header中就行,和添加UserAgent一样。
《 “Python的模拟登录_tips” 》 有 6 条评论
python中的urlencode与urldecode
http://blog.csdn.net/haoni123321/article/details/15814111/
`
urllib.urlencode({‘tag’: ‘魔兽’}) # 对 dict 进行URL编码
urllib.quote(‘魔兽’) # 对 string 进行URL编码
`
Python urllib模块的URL编码解码功能
http://www.nowamagic.net/academy/detail/1302863
Python的request模块
http://docs.python-requests.org/en/master/user/quickstart/
`
>>> payload = {‘key1’: ‘value1’, ‘key2’: [‘value2’, ‘value3’]}
>>> r = requests.get(‘http://httpbin.org/get’, params=payload)
>>> print(r.url)
http://httpbin.org/get?key1=value1&key2=value2&key2=value3
`
模拟登录一些知名的网站,为了方便爬取需要登录的网站
https://github.com/xchaoinfo/fuck-login
Python中用 Requests 带cookie进行访问
http://docs.python-requests.org/zh_CN/latest/user/advanced.html
http://docs.python-requests.org/zh_CN/latest/user/quickstart.html#cookie
Python中解析HTTP响应返回的json数据
http://stackoverflow.com/questions/16877422/parsing-json-responses
http://stackoverflow.com/questions/6386308/http-requests-and-json-parsing-in-python
Python使用Cookie字符串发起HTTP请求的几个方法(1)
http://www.lijiejie.com/python-http-request-with-cookie-string/
http://www.lijiejie.com/python-http-request-with-cookie-string-part-2/
使用 Python 读取火狐的 cookies
https://blog.lilydjwg.me/2017/11/6/retrieve-cookies-from-firefox-in-python.211149.html
https://github.com/lilydjwg/winterpy/blob/master/pylib/firefoxcookies.py
CookieMonster – 从浏览器提取凭证和 cookie 的工具(目前仅支持 Chrome)
https://github.com/rasta-mouse/CookieMonster