在wget和curl中使用代理

=Start=

缘由：

看到关注的博客中有一篇更新「wget和curl中使用代理」，觉得以后可能会用到，所以先测试、验证留待以后使用。

参考解答：

#直接在命令中指定代理
wget -e "use_proxy=yes" -e "http_proxy=10.1.4.43:8080" ixyzero.com
curl -x 10.4.90.9:8080 ixyzero.com

#通过设置环境变量使用代理
http_proxy="http://mycache.mydomain.com:3128"
http_proxy="http://myuser:[email protected]:3128"  #用户名/密码
export $http_proxy
wget ixyzero.com
curl ixyzero.com

curl -x "http://mydomain.com:3128" ixyzero.com
curl -x "http://myuser:[email protected]:3128" ixyzero.com

curl --user-agent "curl_with_proxy" -I http://ixyzero.com/blog/
curl --user-agent "curl_with_proxy" -I -x "10.4.90.9:8080" http://ixyzero.com/blog/

wget -U "wget_with_proxy" http://ixyzero.com/blog/regex.html
wget -U "wget_with_proxy" -e "use_proxy=yes" -e "http_proxy=10.4.90.9:8080" http://ixyzero.com/blog/regex.html

参考链接：

=END=

14 5 月, 2016

admin

KnowledgeBase, Linux, Tools

curl, Linux, proxy, wget

《 “在wget和curl中使用代理” 》有 18 条评论

a-z说道：

2016-12-08 15:53

用Python写的一个代理抓取脚本
https://github.com/stamparm/fetch-some-proxies

https://github.com/Greyh4t/ProxyPool

https://github.com/qiyeboy/IPProxys

回复
a-z说道：

2016-12-14 11:31

又一个代理抓取脚本
https://github.com/DanMcInerney/elite-proxy-finder

简易爬虫代理IP池
https://github.com/jhao104/proxy_pool

回复
a-z说道：

2017-03-17 15:24

【MiSRC】技术分享-爬虫这件小事
http://mp.weixin.qq.com/s?__biz=MzI2NzI2OTExNA==&mid=2247484262&idx=1&sn=423ead267430e873a1e86dec1b531c7f

回复
a-z说道：

2017-05-18 11:43

Golang实现的IP代理池
https://github.com/henson/ProxyPool

回复
a-z说道：

2017-07-04 22:16

Proxy [Finder | Checker | Server]. HTTP(S) & SOCKS
http://proxybroker.readthedocs.io
https://github.com/constverum/ProxyBroker

回复
a-z说道：

2017-07-20 14:04

getproxy 是一个抓取发放代理网站，获取 http/https 代理的程序
https://github.com/fate0/getproxy
https://github.com/fate0/proxylist

回复
a-z说道：

2017-08-22 19:59

Python爬虫项目整理
https://segmentfault.com/p/1210000009117809/read
http://blog.csdn.net/u011781521/article/details/70179998

回复
a-z说道：

2017-09-27 12:10

大话爬虫的基本套路
https://blog.thankbabe.com/2017/09/25/spider/
https://github.com/SFLAQiu/SpiderDemo

回复
a-z说道：

2017-10-16 11:00

反击爬虫，前端工程师的脑洞可以有多大？
http://litten.me/2017/07/09/prevent-spiders/
`
2. 后端与反爬虫
后端目前比较常规单有效的防爬虫手段，比如：
· User-Agent + Referer检测
· 账号及Cookie验证
· 验证码
· IP限制频次

而爬虫是可以无限逼近于真人的，比如：
· chrome headless或phantomjs来模拟浏览器环境
· tesseract识别验证码
· 代理IP淘宝就能买到
所以我们说，100%的反爬虫策略？不存在的。更多的是体力活，是个难易程度的问题。

3. 前端与反爬虫
3.1 font-face拼凑式
3.2 background拼凑式
3.3 字符穿插式
3.4 伪元素隐藏式
3.5 元素定位覆盖式
3.6 iframe异步加载式
3.7 字符分割式
3.8 字符集替换式
`

回复
a-z说道：

2017-11-10 15:44

如何利用基于 Node 的 Puppeteer 控制 Headless Chrome 浏览器，用这种方式写自动化的爬虫
https://codeburst.io/a-guide-to-automating-scraping-the-web-with-javascript-chrome-puppeteer-node-js-b18efb9e9921

https://tutorialzine.com/2017/08/automating-google-chrome-with-node-js
https://github.com/GoogleChrome/puppeteer
https://github.com/cheeaun/puppetron
https://medium.com/@e_mad_ehsan/getting-started-with-puppeteer-and-chrome-headless-for-web-scrapping-6bf5979dee3e

回复
a-z说道：

2017-11-22 21:45

Web网页爬虫对抗指南 Part.1
http://www.4hou.com/technology/8482.html
https://github.com/JonasCz/How-To-Prevent-Scraping/blob/master/README.md

回复
a-z说道：

2017-11-28 19:40

Web 网页爬虫对抗指南 Part.2
http://www.4hou.com/web/8736.html

回复
a-z说道：

2017-12-04 15:50

gowitness – Golang 语言编写的一个基于 Chrome Headless 的网页快照图片生成工具
https://sensepost.com/blog/2017/gowitness-a-new-tool-for-an-old-idea/
https://github.com/sensepost/gowitness

回复
a-z说道：

2018-01-19 13:26

检测 Chrome Headless 模式的新方式
https://antoinevastel.github.io/bot%20detection/2018/01/17/detect-chrome-headless-v2.html
`
# User agent (Old, 根据UA来进行判断)
if (/HeadlessChrome/.test(window.navigator.userAgent)) {
console.log(“Chrome headless detected”);
}

# Webdriver (New, 根据 navigator.webdriver 来进行判断)
if(navigator.webdriver) {
console.log(“Chrome headless detected”);
}

# Chrome (New, 根据 window.chrome 来进行判断)
// isChrome is true if the browser is Chrome, Chromium or Opera
if(isChrome && !window.chrome) {
console.log(“Chrome headless detected”);
}

# Permissions (New, 根据 navigator.permissions 来进行判断)
navigator.permissions.query({name:’notifications’}).then(function(permissionStatus) {
if(Notification.permission === ‘denied’ && permissionStatus.state === ‘prompt’) {
console.log(‘This is Chrome headless’)
} else {
console.log(‘This is not Chrome headless’)
}
});

# Plugins (Old, 根据 navigator.plugins 来进行判断)
if(navigator.plugins.length === 0) {
console.log(“It may be Chrome headless”);
}

# Languages (Old, 根据 navigator.languages 来进行判断)
if(navigator.languages === “”) {
console.log(“Chrome headless detected”);
}
`

回复
a-z说道：

2018-03-10 13:40

Headless Chrome and API
https://thief.one/2018/03/06/1/
`
1. Headless Chrome 介绍
2. Headless Chrome 安装
3. Headless Chrome 基础用法
4. Headless Chrome API
5. 常见问题
6. 参考文章
`

回复
hi说道：

2018-04-23 13:32

一个Scrapy框架的中间件，用于使用多个代理 (This package provides a Scrapy middleware to use rotating proxies, check that they are alive and adjust crawling speed.)
https://github.com/TeamHG-Memex/scrapy-rotating-proxies

回复
abc说道：

2024-05-08 14:15

使用Squid,简单快速的搭建HTTP代理
https://blog.kieng.cn/2717.html

Centos7下使用Squid快速搭建带认证的HTTP代理服务器
https://blog.phpgao.com/squid_proxy_with_basic_auth.html

回复
abc说道：

2024-05-09 14:15

Setup Squid Proxy With Security Best Practice
https://github.com/password123456/setup-squid-proxy-with-security-best-practice
`
使用Squid代理作为“转发代理”时的安全性最佳实践

如果您正在配置为反向代理，则本指南中的某些主题可能不适用。我们建议在使用反向代理时交叉参考其他安全指南，以获得适当的安全加固标准。
`

回复

ASPIRE

在wget和curl中使用代理

缘由：

参考解答：

参考链接：

《 “在wget和curl中使用代理” 》有 18 条评论

发表回复取消回复

在wget和curl中使用代理

缘由：

参考解答：

参考链接：

《 “在wget和curl中使用代理” 》 有 18 条评论

发表回复 取消回复

《 “在wget和curl中使用代理” 》有 18 条评论

发表回复取消回复