curl,wget，下载速度

清单 1 给出对一个流行的新闻站点执行 curl 命令的情况.输出通常是 HTML 代码,通过 -o 参数发送到/dev/null.-s 参数去掉所有状态信息.-w 参数让 curl 写出表 1 列出的计时器的状态信息：

这些计时器都相对于事务的起始时间,甚至要先于 Domain Name Service（DNS）查询.因此,在发出请求之后,Web 服务器处理请求并开始发回数据所用的时间是 0.272 – 0.081 = 0.191 秒.客户机从服务器下载数据所用的时间是 0.779 – 0.272 = 0.507 秒.

计时器	描述
time_connect	建立到服务器的 TCP 连接所用的时间
time_starttransfer	在发出请求之后,Web 服务器返回数据的第一个字节所用的时间
time_total	完成请求所用的时间
time_namelookup	DNS解析时间,从请求开始到DNS解析完毕所用时间(记得关掉 Linux 的 nscd 的服务测试)
speed_download	下载速度，单位-字节每秒。

通过观察 curl 数据及其随时间变化的趋势,可以很好地了解站点对用户的响应性.以上变量会按CURL认为合适的格式输出，输出变量需要按照%{variable_name}的格式，如果需要输出%，double一下即可，即%%，同时，\n是换行，\r是回车，\t是TAB。

当然,Web 站点不仅仅由页面组成.它还有图像、JavaScript 代码、CSS 和 cookie 要处理.curl 很适合了解单一元素的响应时间,但是有时候需要了解整个页面的装载速度.

curl -o /dev/null -s -w %{http_code}:%{time_connect}:%{time_starttransfer}:%{time_total} http://www.miotour.com

-s 静默输出；没有-s的话就是下面的情况，这是在脚本等情况下不需要的信息。

curl -o /dev/null -sw '%{http_code}:%{time_total}:%{time_connect}:%{time_starttransfer}\n' http://www.miotour.com

curl -o /dev/null -s -w ‘%{time_connect}:%{time_starttransfer}:%{time_total}\n’ http://www.miotour.com

time_starttransfer 在发出请求之后，Web 服务器返回数据的第一个字节所用的时间

在发出请求之后，Web 服务器处理请求并开始发回数据所用的时间是

有时候为了测试网络情况，需要返回每个阶段的耗时时间，比如DNS解析耗时，建立连接所消耗的时间，从建立连接到准备传输所使用的时间，从建立连接到传输开始所使用的时间，整个过程耗时，下载的数据量，下载速度，上传数据量，上传速度等等。下面的脚本获取以上信息：

使用 cURL 获取站点的各类响应时间 – dns解析时间,响应时间,传输时间

使用 cURL 获取站点的各类响应时间 – dns解析时间,响应时间,传输时间等：

 
      Example 
     
         curl -o  
         /dev/null 
         -s -w %{http_code}:%{http_connect}:%{content_type}:%{time_namelookup}:%{time_redirect}:%{time_pretransfer}:%{time_connect}:%{time_starttransfer}:%{time_total}:%{speed_download} digdeeply.org

这是一个本人博客站点执行 curl 命令的情况。输出通常是 HTML 代码，通过 -o 参数发送到 /dev/null。-s 参数去掉所有状态信息。-w 参数让 curl 输出的计时器的状态信息。

一次http请求中的各个时间段-dns解析,等待服务器响应,获取内容等

下边对-w参数做个详细的解释，由我(DigDeeply)翻译。有不对的地方请大家指出。(英文原文：http://curl.haxx.se/docs/manpage.html)

以下是可用的变量名：

　-w, --write-out 
  　以下变量会按CURL认为合适的格式输出，输出变量需要按照%{variable_name}的格式，如果需要输出%，double一下即可，即%%，同时，\n是换行，\r是回车，\t是TAB。

url_effective The URL that was fetched last. This is most meaningful if you've told curl to follow location: headers.

filename_effective The ultimate filename that curl writes out to. This is only meaningful if curl is told to write to a file with the --remote-name or --output option. It's most useful in combination with the --remote-header-name option. (Added in 7.25.1)

http_code http状态码，如200成功,301转向,404未找到,500服务器错误等。(The numerical response code that was found in the last retrieved HTTP(S) or FTP(s) transfer. In 7.18.2 the alias response_code was added to show the same info.)

http_connect The numerical code that was found in the last response (from a proxy) to a curl CONNECT request. (Added in 7.12.4)

time_total 总时间，按秒计。精确到小数点后三位。 （The total time, in seconds, that the full operation lasted. The time will be displayed with millisecond resolution.）

time_namelookup DNS解析时间,从请求开始到DNS解析完毕所用时间。(The time, in seconds, it took from the start until the name resolving was completed.)

time_connect 连接时间,从开始到建立TCP连接完成所用时间,包括前边DNS解析时间，如果需要单纯的得到连接时间，用这个time_connect时间减去前边time_namelookup时间。以下同理，不再赘述。(The time, in seconds, it took from the start until the TCP connect to the remote host (or proxy) was completed.)

time_appconnect 连接建立完成时间，如SSL/SSH等建立连接或者完成三次握手时间。(The time, in seconds, it took from the start until the SSL/SSH/etc connect/handshake to the remote host was completed. (Added in 7.19.0))

time_pretransfer 从开始到准备传输的时间。(The time, in seconds, it took from the start until the file transfer was just about to begin. This includes all pre-transfer commands and negotiations that are specific to the particular protocol(s) involved.)

time_redirect 重定向时间，包括到最后一次传输前的几次重定向的DNS解析，连接，预传输，传输时间。(The time, in seconds, it took for all redirection steps include name lookup, connect, pretransfer and transfer before the final transaction was started. time_redirect shows the complete execution time for multiple redirections. (Added in 7.12.3))

time_starttransfer 开始传输时间。在发出请求之后，Web 服务器返回数据的第一个字节所用的时间(The time, in seconds, it took from the start until the first byte was just about to be transferred. This includes time_pretransfer and also the time the server needed to calculate the result.)

size_download 下载大小。(The total amount of bytes that were downloaded.)

size_upload 上传大小。(The total amount of bytes that were uploaded.)

size_header  下载的header的大小(The total amount of bytes of the downloaded headers.)

size_request 请求的大小。(The total amount of bytes that were sent in the HTTP request.)

speed_download 下载速度，单位-字节每秒。(The average download speed that curl measured for the complete download. Bytes per second.)

speed_upload 上传速度,单位-字节每秒。(The average upload speed that curl measured for the complete upload. Bytes per second.)

content_type 就是content-Type，不用多说了，这是一个访问我博客首页返回的结果示例(text/html; charset=UTF-8)；(The Content-Type of the requested document, if there was any.)

num_connects Number of new connects made in the recent transfer. (Added in 7.12.3)

num_redirects Number of redirects that were followed in the request. (Added in 7.12.3)

redirect_url When a HTTP request was made without -L to follow redirects, this variable will show the actual URL a redirect would take you to. (Added in 7.18.2)

ftp_entry_path The initial path libcurl ended up in when logging on to the remote FTP server. (Added in 7.15.4)

ssl_verify_result ssl认证结果，返回0表示认证成功。( The result of the SSL peer certificate verification that was requested. 0 means the verification was successful. (Added in 7.19.0))

若多次使用-w参数，按最后一个的格式输出。If this option is used several times, the last one will be used.

转载请注明：来自：DigDeeply’s Blog–使用 cURL 获取站点的各类响应时间 – dns解析时间,响应时间,传输时间

curl 和 wget 命令，目前已经支持Linux和Windows平台，后续将介绍。

curl 支持 http，https，ftp，ftps，scp，telnet等网络协议，详见手册 man curl

wget 命令安装： sudo apt-get install wget （普通用户登录，需输入密码； root账户登录，无需输入密码）

Windows平台下，curl下载解压后，直接是curl.exe格式，拷贝到系统命令目录下 C:\Windows\System32 即可

Windows平台下，wget下载解压后，是wget-1.11.4-1-setup.exe格式，需要安装；安装后，在环境变量 - 系统变量 - Path 中添加其安装目录即可

抓取网页，主要有url 网址和proxy代理两种方式，下面以抓取“百度”首页为例，分别介绍

有的时候，由于网速/数据丢包/服务器宕机/等原因，导致暂时无法成功下载网页

这时，可能就需要多次尝试发送连接，请求服务器的响应；如果多次仍无响应，则可以确认服务器出问题了

curl --retry 10 --retry-delay 60 --retry-max-time 60 http://www.baidu.com/ -o baidu_html

注： --retry表示重试次数； --retry-delay表示两次重试之间的时间间隔（秒为单位）； --retry-max-time表示在此最大时间内只容许重试一次（一般与--retry-delay相同）

注：-t（--tries）表示重试次数； -w表示两次重试之间的时间间隔（秒为单位）； -T表示连接超时时间，如果超时则连接不成功，继续尝试下一次连接

附： curl 判断服务器是否响应，还可以通过一段时间内下载获取的字节量来间接判断，命令格式如下：

注：-y表示测试网速的时间； -Y表示-y这段时间下载的字节量（byte为单位）； -m表示容许请求连接的最大时间，超过则连接自动断掉放弃连接

proxy代理下载，是通过连接一台中间服务器间接下载url网页的过程，不是url直接连接网站服务器下载

xroxy.com（通过设置端口类型、代理类型、国家名称进行筛选）

在freeproxylists.net网站，选择一台中国的免费代理服务器为例，来介绍proxy代理抓取网页：

218.107.21.252:8080（ip为218.107.21.252；port为8080，中间以冒号“:”隔开，组成一个套接字）

curl -x 218.107.21.252:8080 -o aaaaa http://www.baidu.com（port 常见有80，8080，8086，8888，3128等，默认为80）

注：-x表示代理服务器（ip:port），即curl先连接到代理服务器218.107.21.252:8080，然后再通过218.107.21.252:8080下载百度首页，最后218.107.21.252:8080把下载的百度首页传给curl至本地（curl不是直接连接百度服务器下载首页的，而是通过一个中介代理来完成）

wget通过代理下载，跟curl不太一样，需要首先设置代理服务器的http_proxy=ip:port

以ubuntu为例，在当前用户目录（cd ~），新建一个wget配置文件（.wgetrc），输入代理配置：

ftp协议、迭代子目录等更多的curl 和 wget用法，可以man查看帮助手册

在国内，由于某种原因一般难以直接访问国外某些敏感网站，需要通过 VPN 或代理服务器才能访问

如果校园网和教育网有IPv6，则可以通过sixxs.org免费代理访问facebook、twitter、六维空间等网站

其实，除了VPN 和 IPv6+sixxs.org代理方式外，普通用户还是有其它途径访问到国外网站

xroxy.com（通过设置端口类型、代理类型、国家名称进行筛选）

使用curl + freeproxylists.net免费代理，实现了全球12国家google play游戏排名的网页抓取以及趋势图查询（抓取网页模块全部使用Shell编写，核心代码约1000行）

curl vs Wget

1 wget

wget是linux最常用的下载命令, 一般的使用方法是: wget + 空格 + 要下载文件的url路径

简单说一下-c参数, 这个也非常常见, 可以断点续传, 如果不小心终止了, 可以继续使用命令接着下载

wget是一个从网络上自动下载文件的自由工具。它支持HTTP，HTTPS和FTP协议，可以使用HTTP代理.

所谓的自动下载是指，wget可以在用户退出系统的之后在后台执行。这意味这你可以登录系统，启动一个wget下载任务，然后退出系统，wget将在后台执行直到任务完成，相对于其它大部分浏览器在下载大量数据时需要用户一直的参与，这省去了极大的麻烦。

wget可以跟踪HTML页面上的链接依次下载来创建远程服务器的本地版本，完全重建原始站点的目录结构。这又常被称作”递归下载”。在递归下载的时候，wget遵循Robot Exclusion标准(/robots.txt). wget可以在下载的同时，将链接转换成指向本地文件，以方便离线浏览。

wget非常稳定,它在带宽很窄的情况下和不稳定网络中有很强的适应性.如果是由于网络的原因下载失败，wget会不断的尝试，直到整个文件下载完毕。如果是服务器打断下载过程，它会再次联到服务器上从停止的地方继续下载。这对从那些限定了链接时间的服务器上下载大文件非常有用。

* 在不稳定的网络上下载一个部分下载的文件，以及在空闲时段下载

wget -t 0 -w 31 -c -B ftp://dsec.pku.edu.cn/linuxsoft -i filelist.txt -o down.log &

上面的代码还可以用来在网络比较空闲的时段进行下载。我的用法是:在mozilla中将不方便当时下载的URL链接拷贝到内存中然后粘贴到文件filelist.txt中，在晚上要出去系统前执行上面代码的第二条。

-e, –execute=COMMAND 执行`.wgetrc’格式的命令，wgetrc格式参见/etc/wgetrc或~/.wgetrc

http://www.itqun.net/content-detail/511328.html
http://www.guanwei.org/post/LINUXnotes/05/Linux-Wget-download-method.html

LINUX命令行下以HTTP方式下载文件的方法
Post by mrchen, 2010-5-23, Views:101
原创文章如转载，请注明：转载自冠威博客 [ http://www.guanwei.org/ ]
本文链接地址：http://www.guanwei.org/post/LINUXnotes/05/Linux-Wget-download-method.html

顺便提一下。如果下载ftp服务器上的文件，可以用ftp命令。然后用get命令下载文件

对于喜欢命令行操作及追求高效率、高速度下载的朋友，推荐使用命令行下载工具。命令行工具不但使用方便，而且大多具有很高的下载速度及下载效率，尤其适合于大批量下载文件。下面就为大家详细介绍一下这些工具。

Wget是一个十分常用命令行下载工具，多数Linux发行版本都默认包含这个工具。如果没有安装可在http://www.gnu.org/software/wget/wget.html下载最新版本，并使用如下命令编译安装：

    #tar zxvf wget-1.9.1.tar.gz
    #cd wget-1.9.1 #./configure
    #make #make install

◆-t：尝试连接次数，当Wget无法与服务器建立连接时，尝试连接多少次。

◆-c：断点续传，如果下载中断，那么连接恢复时会从上次断点开始下载。

除了上述常用功能，Wget还支持HTTP和FTP代理功能，编辑其配置文件“/etc/wgetrc”即可。具体方法是使用VI编辑器打开上述文件，将 “http_proxy”和“ftp_proxoy”前的#去掉，然后在这两项后输入相应的代理服务器的地址，保存退出即可。此外，Wget还可下载整个网站，如下载整个Man手册中心。只需输入如下命令即可： #wget -r -p -np -k http://man.chinaunix.net

其中-r参数是指使用递归下载，-p是指下载所有显示完整网页所以需要的文件，如图片等，-np是指不搜索上层目录，-k则是指将绝对链接转换为相对链接。

2 Prozilla

Prozilla也是一个十分流行的命令行下载工具，支持多线程下载和断点续传功能。可到http://prozilla.genesys.ro/下载最新的1.3.7.4安装包，下载安装包后使用如下命令进行安装：

    #tar zxvf prozilla-1.3.7.4.tar.gz
    #cd prozilla-1.3.7.4
    #./configure #make
    #make install

Prozilla命令格式如下： #proz [参数] [下载地址] 常用的选项有：

◆-k=n ：设置n个线程下载。不加此参数指定线程数，Prozilla默认为4线程下载。

◆-r, --resume：继续下载未完成的文件。如果要指定线程数下载可用如下命令： #proz -k=5 http://64.12.204.21/pub/mozilla.org/firefox/releases/1.0/linux-i686/zh-CN/firefox-1.0.installer.tar.gz 这样便以5线程进行文件的下载，并将文件保存到当前目录。和Wget一样，Prozilla也提供了续传功能，下载中断后，重新输入上述命令，就会出现提示续传，按R键就可继续下载了。

3 Myget

MyGet目标设计成一个可扩展的，拥有丰富界面的多线程下载工具，它支持HTTP、FTP、HTTPS、MMS、RTSP等协议。在 http://myget.sourceforge.net/release/myget-0.1.0.tar.bz2下载其最新版本0.1.0，下载后使用如下命令安装：

    #tar jxvf myget-0.1.0.tar.bz2
    #cd myget-0.1.0 #./configure
    #make
    #make install

◆-d [目录]：指定下载到的文件在本地存放的位置，默认当前目录。

◆-x [代理服务器地址]：设置代理服务器地址，如“-x http://user:password@host :port”。 MyGet常用的形式如下： #mytget －d /root/ -n 10 http://lumaqq.linuxsir.org/download/patch/lumaqq_2004t_patch_2005.07.21.00.00.zip

4 Linuxdown

Linuxdown是一个命令行多线程下载工具，最多可支持30线程的下载。在https://gro.clinux.org/frs /download.php/1015/linuxdown-1.0.0.tar.gz下载最新的1.1.0版本。然后使用如下命令进行编译安装：

    #tar zxvf linuxdown-1.1.0.tar.gz
    #cd dandelion/
    #make
    #make install

Linuxdown格式为： #linuxdown [下载地址] [选项] [线程数] 需要注意的是下载地址和选项都需要西文引号括起来，线程数不可超过30个。一个典型的下载如下： #linuxdown "http://lumaqq.linuxsir.org/download/patch/lumaqq_2004t_patch_2005.07.21.00.

5 Curl

Curl也是Linux下不错的命令行下载工具，小巧、高速，唯一的缺点是不支持多线程下载。在http://curl.haxx.se/download/curl-7.14.0.tar.gz下载最新版本。下载后便可使用如下命令编译安装：

    #tar zxvf curl-7.14.0.tar.gz
    #cd curl-7.14.0/
    #./configure
    #make
    #make test
    #make install

Curl使用格式如下： #curl [选项][下载地址] Curl典型下载如下： #curl -O http://10.1.27.10/~kennycx/tools/lumaqq_2004-linux_gtk2_x86_with_jre.tar.gz 使用Curl下载一个文件并保存到当前目录。此外，Curl虽然不支持多线程下载，但它可同时下载多个文件或下载文件的某一部分，可使用如下命令实现： #curl -r 0-199 http://www.netscape.com/ 获得文件的前200 bytes。对于常用的代理下载Curl也可轻松实现，具体操作如下： #curl -x 10.1.27.10:1022 ftp://ftp.funet.fi/README 使用代理地址为10.1.27.10端口为1022的代理服务器下载一个文件。 #curl -U user:passwd -x 10.1.27.10:1022 ftp://ftp.funet.fi/README 如果代理服务器需要特别的验证，则需要在user:passwd处输入合法的帐

6 Axel

Axel是命令行下的多线程下载工具，支持断点续传，速度通常情况下是Wget的几倍。可在http://www.linuxfans.org /nuke/modules.php?name=Site_Downloads&op=mydown&did=1697下载。下载后使用如下命令编译安装：

    #tar zxvf axel-1.0a.tar.gz
    #cd axel-1.0a/
    #./configure
    #make
    #make install

基本的用法如下： #axel [选项] [下载目录] [下载地址] 一个典型下载如下： #alex -n 10 -o /home/kennycx/ http://10.1.27.10/~kennycx/tools/lumaqq_2004-linux_gtk2_x86_with_jre.tar.gz 用10线程将指定路径的文件下载到/home/kennycx/这个目录下。

本文详细介绍了Linux中常用的下载工具，这些下载工具功能上各有千秋，使用上都比较简单，所以无论是初学者还是Linux高手总有一款适合你。

Linux下用命令行也可以下载HTTP网站的文件。顺便提一下，如果是ftp网站可以用ftp命令然后get XXX。