Python中第三方模塊requests解析

本文轉載自查看原文 2019-03-25 15:03 620 2.7、網絡編程

一、簡述

　　Requests HTTP Library

二、模塊框架

'''
    __version__
    _internal_utils
    adapters
    api
    auth
    certs
    compat
    cookies
    exceptions
    help
    hooks
    models
    packages
    sessions
    status_codes
    structures
    utils
'''

Packages

'''
GET 請求獲取URL位置的資源
HEAD 請求獲取URL位置資源的響應消息報告，即獲得該資源的頭部信息
POST 請求向URL位置的資源后附加新的數據
PUT 請求向URL位置存儲一個資源，覆蓋原URL位置的資源
PATCH 請求局部更新URL位置的資源，即改變該處資源的部分內容
DELETE 請求刪除URL位置存儲的資源
HTTP協議方法於requests庫方法是一一對應的。
requests庫的7個主要方法:
requests.request() 構造一個請求，支撐以下各方法的基礎方法
requests.get()        獲取HTML網頁的主要方法，對應於HTTP的GET
requests.head()     獲取HTML網頁頭信息的方法，對應於HTTP的HEAD
requests.post()      向HTML網頁提交POST請求的方法，對應於HTTP的POST
requests.put()        向HTML網頁提交PUT請求的方法，對應於HTTP的PUT
requests.patch()     向HTML網頁提交局部修改請求，對應於HTTP的PATCH
requests.delete()    向HTML頁面提交刪除請求，對應於HTTP的DELETE
'''

Function

三、運用

#coding=utf-8
#  1、導入模塊
import requests
#  2、使用get方法獲取html網頁對象obj
obj = requests.get("https://www.baidu.com/")
#  3、 查看狀態碼，狀態碼為200表示訪問成功
print obj.status_code 
#  4、更改網頁編碼格式為utf-8
obj.encoding = 'utf-8' 
#  5、打印網頁內容  
print obj.text

A Simple Example:

'''
obj是一個<class 'requests.models.Response'>對象

Help on Response in module requests.models object:
class Response(__builtin__.object)
 |  The :class:`Response <Response>` object, which contains a
 |  server's response to an HTTP request.
 |  
 |  Methods defined here:
 |  
 |  __bool__(self)
 |      Returns True if :attr:`status_code` is less than 400.
 |      
 |      This attribute checks if the status code of the response is between
 |      400 and 600 to see if there was a client error or a server error. If
 |      the status code, is between 200 and 400, this will return True. This
 |      is **not** a check to see if the response code is ``200 OK``.
 |  
 |  __enter__(self)
 |  
 |  __exit__(self, *args)
 |  
 |  __getstate__(self)
 |  
 |  __init__(self)
 |  
 |  __iter__(self)
 |      Allows you to use a response as an iterator.
 |  
 |  __nonzero__(self)
 |      Returns True if :attr:`status_code` is less than 400.
 |      
 |      This attribute checks if the status code of the response is between
 |      400 and 600 to see if there was a client error or a server error. If
 |      the status code, is between 200 and 400, this will return True. This
 |      is **not** a check to see if the response code is ``200 OK``.
 |  
 |  __repr__(self)
 |  
 |  __setstate__(self, state)
 |  
 |  close(self)
 |      Releases the connection back to the pool. Once this method has been
 |      called the underlying ``raw`` object must not be accessed again.
 |      
 |      *Note: Should not normally need to be called explicitly.*
 |  
 |  iter_content(self, chunk_size=1, decode_unicode=False)
 |      Iterates over the response data.  When stream=True is set on the
 |      request, this avoids reading the content at once into memory for
 |      large responses.  The chunk size is the number of bytes it should
 |      read into memory.  This is not necessarily the length of each item
 |      returned as decoding can take place.
 |      
 |      chunk_size must be of type int or None. A value of None will
 |      function differently depending on the value of `stream`.
 |      stream=True will read data as it arrives in whatever size the
 |      chunks are received. If stream=False, data is returned as
 |      a single chunk.
 |      
 |      If decode_unicode is True, content will be decoded using the best
 |      available encoding based on the response.
 |  
 |  iter_lines(self, chunk_size=512, decode_unicode=False, delimiter=None)
 |      Iterates over the response data, one line at a time.  When
 |      stream=True is set on the request, this avoids reading the
 |      content at once into memory for large responses.
 |      
 |      .. note:: This method is not reentrant safe.
 |  
 |  json(self, **kwargs)
 |      Returns the json-encoded content of a response, if any.
 |      
 |      :param \*\*kwargs: Optional arguments that ``json.loads`` takes.
 |      :raises ValueError: If the response body does not contain valid json.
 |  
 |  raise_for_status(self)
 |      Raises stored :class:`HTTPError`, if one occurred.
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  
 |  __weakref__
 |      list of weak references to the object (if defined)
 |  
 |  apparent_encoding
 |      The apparent encoding, provided by the chardet library.
 |  
 |  content
 |      Content of the response, in bytes.
 |  
 |  is_permanent_redirect
 |      True if this Response one of the permanent versions of redirect.
 |  
 |  is_redirect
 |      True if this Response is a well-formed HTTP redirect that could have
 |      been processed automatically (by :meth:`Session.resolve_redirects`).
 |  
 |  links
 |      Returns the parsed header links of the response, if any.
 |  
 |  next
 |      Returns a PreparedRequest for the next request in a redirect chain, if there is one.
 |  
 |  ok
 |      Returns True if :attr:`status_code` is less than 400, False if not.
 |      
 |      This attribute checks if the status code of the response is between
 |      400 and 600 to see if there was a client error or a server error. If
 |      the status code is between 200 and 400, this will return True. This
 |      is **not** a check to see if the response code is ``200 OK``.
 |  
 |  text
 |      Content of the response, in unicode.
 |      
 |      If Response.encoding is None, encoding will be guessed using
 |      ``chardet``.
 |      
 |      The encoding of the response content is determined based solely on HTTP
 |      headers, following RFC 2616 to the letter. If you can take advantage of
 |      non-HTTP knowledge to make a better guess at the encoding, you should
 |      set ``r.encoding`` appropriately before accessing this property.
 |  
 |  ----------------------------------------------------------------------
 |  Data and other attributes defined here:
 |  
 |  __attrs__ = ['_content', 'status_code', 'headers', 'url', 'history', '...

None

'''

Analysis

四、模塊方法詳解

# -*- coding: utf-8 -*-

"""
requests.api
~~~~~~~~~~~~

This module implements the Requests API.

:copyright: (c) 2012 by Kenneth Reitz.
:license: Apache2, see LICENSE for more details.
"""

from . import sessions


def request(method, url, **kwargs):
    """Constructs and sends a :class:`Request <Request>`.

    :param method: method for the new :class:`Request` object.
    :param url: URL for the new :class:`Request` object.
    :param params: (optional) Dictionary, list of tuples or bytes to send
        in the body of the :class:`Request`.
    :param data: (optional) Dictionary, list of tuples, bytes, or file-like
        object to send in the body of the :class:`Request`.
    :param json: (optional) A JSON serializable Python object to send in the body of the :class:`Request`.
    :param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`.
    :param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`.
    :param files: (optional) Dictionary of ``'name': file-like-objects`` (or ``{'name': file-tuple}``) for multipart encoding upload.
        ``file-tuple`` can be a 2-tuple ``('filename', fileobj)``, 3-tuple ``('filename', fileobj, 'content_type')``
        or a 4-tuple ``('filename', fileobj, 'content_type', custom_headers)``, where ``'content-type'`` is a string
        defining the content type of the given file and ``custom_headers`` a dict-like object containing additional headers
        to add for the file.
    :param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth.
    :param timeout: (optional) How many seconds to wait for the server to send data
        before giving up, as a float, or a :ref:`(connect timeout, read
        timeout) <timeouts>` tuple.
    :type timeout: float or tuple
    :param allow_redirects: (optional) Boolean. Enable/disable GET/OPTIONS/POST/PUT/PATCH/DELETE/HEAD redirection. Defaults to ``True``.
    :type allow_redirects: bool
    :param proxies: (optional) Dictionary mapping protocol to the URL of the proxy.
    :param verify: (optional) Either a boolean, in which case it controls whether we verify
            the server's TLS certificate, or a string, in which case it must be a path
            to a CA bundle to use. Defaults to ``True``.
    :param stream: (optional) if ``False``, the response content will be immediately downloaded.
    :param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair.
    :return: :class:`Response <Response>` object
    :rtype: requests.Response

    Usage::

      >>> import requests
      >>> req = requests.request('GET', 'https://httpbin.org/get')
      <Response [200]>
    """

    # By using the 'with' statement we are sure the session is closed, thus we
    # avoid leaving sockets open which can trigger a ResourceWarning in some
    # cases, and look like a memory leak in others.
    with sessions.Session() as session:
        return session.request(method=method, url=url, **kwargs)


def get(url, params=None, **kwargs):
    r"""Sends a GET request.

    :param url: URL for the new :class:`Request` object.
    :param params: (optional) Dictionary, list of tuples or bytes to send
        in the body of the :class:`Request`.
    :param \*\*kwargs: Optional arguments that ``request`` takes.
    :return: :class:`Response <Response>` object
    :rtype: requests.Response
    """

    kwargs.setdefault('allow_redirects', True)
    return request('get', url, params=params, **kwargs)


def options(url, **kwargs):
    r"""Sends an OPTIONS request.

    :param url: URL for the new :class:`Request` object.
    :param \*\*kwargs: Optional arguments that ``request`` takes.
    :return: :class:`Response <Response>` object
    :rtype: requests.Response
    """

    kwargs.setdefault('allow_redirects', True)
    return request('options', url, **kwargs)


def head(url, **kwargs):
    r"""Sends a HEAD request.

    :param url: URL for the new :class:`Request` object.
    :param \*\*kwargs: Optional arguments that ``request`` takes.
    :return: :class:`Response <Response>` object
    :rtype: requests.Response
    """

    kwargs.setdefault('allow_redirects', False)
    return request('head', url, **kwargs)


def post(url, data=None, json=None, **kwargs):
    r"""Sends a POST request.

    :param url: URL for the new :class:`Request` object.
    :param data: (optional) Dictionary, list of tuples, bytes, or file-like
        object to send in the body of the :class:`Request`.
    :param json: (optional) json data to send in the body of the :class:`Request`.
    :param \*\*kwargs: Optional arguments that ``request`` takes.
    :return: :class:`Response <Response>` object
    :rtype: requests.Response
    """

    return request('post', url, data=data, json=json, **kwargs)


def put(url, data=None, **kwargs):
    r"""Sends a PUT request.

    :param url: URL for the new :class:`Request` object.
    :param data: (optional) Dictionary, list of tuples, bytes, or file-like
        object to send in the body of the :class:`Request`.
    :param json: (optional) json data to send in the body of the :class:`Request`.
    :param \*\*kwargs: Optional arguments that ``request`` takes.
    :return: :class:`Response <Response>` object
    :rtype: requests.Response
    """

    return request('put', url, data=data, **kwargs)


def patch(url, data=None, **kwargs):
    r"""Sends a PATCH request.

    :param url: URL for the new :class:`Request` object.
    :param data: (optional) Dictionary, list of tuples, bytes, or file-like
        object to send in the body of the :class:`Request`.
    :param json: (optional) json data to send in the body of the :class:`Request`.
    :param \*\*kwargs: Optional arguments that ``request`` takes.
    :return: :class:`Response <Response>` object
    :rtype: requests.Response
    """

    return request('patch', url, data=data, **kwargs)


def delete(url, **kwargs):
    r"""Sends a DELETE request.

    :param url: URL for the new :class:`Request` object.
    :param \*\*kwargs: Optional arguments that ``request`` takes.
    :return: :class:`Response <Response>` object
    :rtype: requests.Response
    """

    return request('delete', url, **kwargs)

View Code

五、范例

'''
GET 請求獲取URL位置的資源
HEAD 請求獲取URL位置資源的響應消息報告，即獲得該資源的頭部信息
POST 請求向URL位置的資源后附加新的數據
PUT 請求向URL位置存儲一個資源，覆蓋原URL位置的資源
PATCH 請求局部更新URL位置的資源，即改變該處資源的部分內容
DELETE 請求刪除URL位置存儲的資源
HTTP協議方法於requests庫方法是一一對應的。
requests庫的7個主要方法
requests.request() 構造一個請求，支撐以下各方法的基礎方法
requests.get() 獲取HTML網頁的主要方法，對應於HTTP的GET
requests.head() 獲取HTML網頁頭信息的方法，對應於HTTP的HEAD
requests.post() 向HTML網頁提交POST請求的方法，對應於HTTP的POST
requests.put() 向HTML網頁提交PUT請求的方法，對應於HTTP的PUT
requests.patch() 向HTML網頁提交局部修改請求，對應於HTTP的PATCH
requests.delete() 向HTML頁面提交刪除請求，對應於HTTP的DELETE
1)  head()方法示例
>>> r = requests.head('http://httpbin.org/get')
>>> r.headers
{'Content‐Length': '238', 'Access‐Control‐Allow‐Origin': '*', 'Access‐
Control‐Allow‐Credentials': 'true', 'Content‐Type':
'application/json', 'Server': 'nginx', 'Connection': 'keep‐alive',
'Date': 'Sat, 18 Feb 2017 12:07:44 GMT'}
>>> r.text
''

2)  post()方法示例
>>> payload = {'key1': 'value1', 'key2': 'value2'}
>>> r = requests.post('http://httpbin.org/post', data = payload)
>>> print(r.text)
{ ...
"form": {
"key2": "value2",
"key1": "value1"
},
}
向URL POST一個字典，自動編碼為form（表單）。
post字典，默認存到form表單中。
>>> r = requests.post('http://httpbin.org/post', data = 'ABC')
>>> print(r.text)
{ ...
"data": "ABC"
"form": {},
}
向URL POST一個字符串，自動編碼為data。
post字符串，默認存到data中。

3)  put()方法示例
>>> payload = {'key1': 'value1', 'key2': 'value2'}
>>> r = requests.put('http://httpbin.org/put', data = payload)
>>> print(r.text)
{ ...
"form": {
"key2": "value2",
"key1": "value1"
},
}

4)  request方法
requsets庫的request方法，是所有方法的基礎方法。
request方法的完整使用方法
requests.request(method, url, **kwargs)
method : 請求方式，對應get/put/post等7種
url : 擬獲取頁面的url鏈接
**kwargs: 控制訪問的參數，共13個
methed:request的請求方式（7種）
r = requests.request('GET', url, **kwargs)
r = requests.request('HEAD', url, **kwargs)
r = requests.request('POST', url, **kwargs)
r = requests.request('PUT', url, **kwargs)
r = requests.request('PATCH', url, **kwargs)
r = requests.request('delete', url, **kwargs)
r = requests.request('OPTIONS', url, **kwargs)

http協議的請求參數設置。
OPTIONS是向服務器獲取一些服務器和客戶端能夠打交道的參數。
**kwargs: 控制訪問的參數，均為可選項
params : 字典或字節序列，作為參數增加到url中
>>> kv = {'key1': 'value1', 'key2': 'value2'}
>>> r = requests.request('GET', 'http://python123.io/ws', params=kv)
>>> print(r.url)
http://python123.io/ws?key1=value1&key2=value2

data : 字典、字節序列或文件對象，作為Request的內容
>>> kv = {'key1': 'value1', 'key2': 'value2'}
>>> r = requests.request('POST', 'http://python123.io/ws', data=kv)
>>> body = '主體內容'
>>> r = requests.request('POST', 'http://python123.io/ws', data=body)

json : JSON格式的數據，作為Request的內容
>>> kv = {'key1': 'value1'}
>>> r = requests.request('POST', 'http://python123.io/ws', json=kv)

headers : 字典，HTTP定制頭
>>> hd = {'user‐agent': 'Chrome/10'}
>>> r = requests.request('POST', 'http://python123.io/ws', headers=hd)

cookies : 字典或CookieJar，Request中的cookie
import requests
cookie = "23F5D5F299F9FF7F7541095DA115EFCFADFDF127695462AF30E653A38F03998376B7FA69"
header = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36',
'Connection': 'keep-alive',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Cookie': cookie}
r = requests.get("https://www.cnblogs.com/windyrainy/p/10593806.html",headers=header)
r.encoding = "utf-8"
print(r.text)

auth : 元組，支持HTTP認證功能

files : 字典類型，傳輸文件
>>> fs = {'file': open('data.xls', 'rb')}
>>> r = requests.request('POST', 'http://python123.io/ws', files=fs)
timeout : 設定超時時間，秒為單位
>>> r = requests.request('GET', 'http://www.baidu.com', timeout=10)
proxies : 字典類型，設定訪問代理服務器，可以增加登錄認證
>>> pxs = { 'http': 'http://user:pass@10.10.10.1:1234'
'https': 'https://10.10.10.1:4321' }
>>> r = requests.request('GET', 'http://www.baidu.com', proxies=pxs)

allow_redirects : True/False，默認為True，重定向開關
stream : True/False，默認為True，獲取內容立即下載開關
verify : True/False，默認為True，認證SSL證書開關
cert : 本地SSL證書路徑
'''

范例

六、關鍵知識點理解

1、HTTP原理

'''
互聯網在傳輸數據的時候需要遵循一定的規范格式，其中我們在使用瀏覽器瀏覽網頁的時候就需要遵循HTTP協議,中文名稱為超文本傳輸協議。HTTP協議主要用來傳輸超文本（網頁等）數據。類似的協議還有ftp（主要用來傳輸文件）等.。

　　我們需要采集指定計算機中的數據，那么我們怎么才能找到這台計算機呢？ HTTP協議使用URL來定位計算機和計算機中的數據資源。例如https://www.cnblogs.com/windyrainy/就是一個URL，在瀏覽器上輸入這串字符，就可以找到博客首頁了。https表示協議的名稱，https是http協議的加密版本。www.cnblogs.com表示服務器的域名，通過轉換可以變成ip地址，可以通過域名在茫茫互聯網上定位到博客園的服務器。最后/windyrainy路徑是該服務器web站點下的資源。
'''

http原理

2、HTTP請求

'''
我們在瀏覽器上輸入一個URL，按下回車之后很快就看到了頁面的內容，這其中包含了很復雜的過程，我們需要了解的是，我們的瀏覽器向URL指向的服務器發出了http請求request，服務器處理請求之后，返回響應response。瀏覽器根據response中的源代碼等內容進行解析，渲染之后，我們就可以在瀏覽器上看到豐富多彩的內容了。
'''

http請求

reques主要由以下4部分組成（請求行+請求頭+空行+請求體）組成：

'''
①是請求方法，GET和POST是最常見的HTTP方法，除此以外還包括DELETE、HEAD、OPTIONS、PUT、TRACE。不過，當前的大多數瀏覽器只支持GET和POST，Spring 3.0提供了一個HiddenHttpMethodFilter，允許你通過“_method”的表單參數指定這些特殊的HTTP方法（實際上還是通過POST提交表單）。服務端配置了HiddenHttpMethodFilter后，Spring會根據_method參數指定的值模擬出相應的HTTP方法，這樣，就可以使用這些HTTP方法對處理方法進行映射了。

②為請求對應的URL地址，它和報文頭的Host屬性組成完整的請求URL，

③是協議名稱及版本號。

④是HTTP的報文頭，報文頭包含若干個屬性，格式為“屬性名:屬性值”，服務端據此獲取客戶端的信息。

⑤是報文體，它將一個頁面表單中的組件值通過param1=value1&param2=value2的鍵值對形式編碼成一個格式化串，它承載多個請求參數的數據。不但報文體可以傳遞請求參數，請求URL也可以通過類似於“/chapter15/user.html? param1=value1&param2=value2”的方式傳遞請求參數。
'''

報文結構解析

對照上面的請求報文，我們把它進一步分解，你可以看到一幅更詳細的結構圖：

'''
1)  請求行解析
請求行：請求行由三個標記組成：請求方法、請求URI和HTTP版本，它們用空格分隔。
例如：GET /index.html HTTP/1.1
HTTP規范定義了8種可能的請求方法：
GET                 檢索URI中標識資源的一個簡單請求
HEAD               與GET方法相同，服務器只返回狀態行和頭標，並不返回請求文檔
POST                服務器接受被寫入客戶端輸出流中的數據的請求
PUT                 服務器保存請求數據作為指定URI新內容的請求
DELETE            服務器刪除URI中命名的資源的請求
OPTIONS          關於服務器支持的請求方法信息的請求
TRACE             Web服務器反饋Http請求和其頭標的請求
CONNECT        已文檔化但當前未實現的一個方法，預留做隧道處理

2)  請求頭解析
1. Accept：告訴WEB服務器自己接受什么介質類型，*/* 表示任何類型，type/* 表示該類型下的所有子類型，type/sub-type。
2. Accept-Charset： 瀏覽器申明自己接收的字符集
    Accept-Encoding： 瀏覽器申明自己接收的編碼方法，通常指定壓縮方法，是否支持壓縮，支持什么壓縮方法 （gzip，deflate）
    Accept-Language：：瀏覽器申明自己接收的語言語言跟字符集的區別：中文是語言，中文有多種字符集，比如big5，gb2312，gbk等等。
3. Accept-Ranges：WEB服務器表明自己是否接受獲取其某個實體的一部分（比如文件的一部分）的請求。bytes：表示接受，none：表示不接受。
4. Age：當代理服務器用自己緩存的實體去響應請求時，用該頭部表明該實體從產生到現在經過多長時間了。
5. Authorization：當客戶端接收到來自WEB服務器的 WWW-Authenticate 響應時，該頭部來回應自己的身份驗證信息給WEB服務器。
6. Cache-Control：
　　請求：
　　no-cache（不要緩存的實體，要求現在從WEB服務器去取）
　　max-age：（只接受 Age 值小於 max-age 值，並且沒有過期的對象）
　　max-stale：（可以接受過去的對象，但是過期時間必須小於max-stale 值）
　　min-fresh：（接受其新鮮生命期大於其當前 Age 跟 min-fresh 值之和的緩存對象）
　　響應：
　　public：(可以用 Cached 內容回應任何用戶)
　　private：（只能用緩存內容回應先前請求該內容的那個用戶）
　　no-cache：（可以緩存，但是只有在跟WEB服務器驗證了其有效后，才能返回給客戶端）
　　max-age：（本響應包含的對象的過期時間）
　　ALL: no-store：（不允許緩存）
7. Connection：
　　請求：
　　close（告訴WEB服務器或者代理服務器，在完成本次請求的響應后，斷開連接，不要等待本次連接的后續請求了）。
　　keepalive（告訴WEB服務器或者代理服務器，在完成本次請求的響應后，保持連接，等待本次連接的后續請求）。
　　響應：
　　close（連接已經關閉）。
　　keepalive（連接保持着，在等待本次連接的后續請求）。
　　Keep-Alive：如果瀏覽器請求保持連接，則該頭部表明希望 WEB 服務器保持連接多長時間（秒）。
　　例如：Keep-Alive：300
8. Content-Encoding：WEB服務器表明自己使用了什么壓縮方法（gzip，deflate）壓縮響應中的對象。
　　例如：Content-Encoding：gzip
　　Content-Language：WEB 服務器告訴瀏覽器自己響應的對象的語言。
　　Content-Length： WEB 服務器告訴瀏覽器自己響應的對象的長度。
　　例如：Content-Length: 26012
　　Content-Range： WEB 服務器表明該響應包含的部分對象為整個對象的哪個部分。
　　例如：Content-Range: bytes 21010-47021/47022
　　Content-Type： WEB 服務器告訴瀏覽器自己響應的對象的類型。
　　例如：Content-Type：application/xml
9. ETag：就是一個對象（比如URL）的標志值，就一個對象而言，比如一個 html 文件，如果被修改了，其 Etag 也會別修改， 所以，ETag 的作用跟 Last-Modified 的作用差不多，主要供 WEB 服務器 判斷一個對象是否改變了。比如前一次請求某個 html 文件時，獲得了其 ETag，當這次又請求這個文件時，瀏覽器就會把先前獲得的 ETag 值發送給 WEB 服務器，然后 WEB 服務器會把這個 ETag 跟該文件的當前 ETag 進行對比，然后就知道這個文件有沒有改變了。
10. Expired：WEB服務器表明該實體將在什么時候過期，對於過期了的對象，只有在跟WEB服務器驗證了其有效性后，才能用來響應客戶請求。是 HTTP/1.0 的頭部。
　　例如：Expires：Sat, 23 May 2009 10:02:12 GMT
11. Host：客戶端指定自己想訪問的WEB服務器的域名/IP 地址和端口號。
　　例如：Host：rss.sina.com.cn
12. If-Match：如果對象的 ETag 沒有改變，其實也就意味著對象沒有改變，才執行請求的動作。
　　If-None-Match：如果對象的 ETag 改變了，其實也就意味著對象也改變了，才執行請求的動作。
13. If-Modified-Since：如果請求的對象在該頭部指定的時間之后修改了，才執行請求的動作（比如返回對象），否則返回代碼304，告訴瀏覽器該對象沒有修改。
　　例如：If-Modified-Since：Thu, 10 Apr 2008 09:14:42 GMT
　　If-Unmodified-Since：如果請求的對象在該頭部指定的時間之后沒修改過，才執行請求的動作（比如返回對象）。
14. If-Range：瀏覽器告訴 WEB 服務器，如果我請求的對象沒有改變，就把我缺少的部分給我，如果對象改變了，就把整個對象給我。 瀏覽器通過發送請求對象的ETag 或者 自己所知道的最后修改時間給 WEB 服務器，讓其判斷對象是否改變了。總是跟 Range 頭部一起使用。
15. Last-Modified：WEB 服務器認為對象的最后修改時間，比如文件的最后修改時間，動態頁面的最后產生時間等等。
　　例如：Last-Modified：Tue, 06 May 2008 02:42:43 GMT
16. Location：WEB 服務器告訴瀏覽器，試圖訪問的對象已經被移到別的位置了，到該頭部指定的位置去取。
　　例如：Location：http://i0.sinaimg.cn/dy/deco/2008/0528/sinahome_0803_ws_005_text_0.gif
17. Pramga：主要使用 Pramga: no-cache，相當於 Cache-Control： no-cache。
　　例如：Pragma：no-cache
18. Proxy-Authenticate： 代理服務器響應瀏覽器，要求其提供代理身份驗證信息。
　　Proxy-Authorization：瀏覽器響應代理服務器的身份驗證請求，提供自己的身份信息。
19. Range：瀏覽器（比如 Flashget 多線程下載時）告訴 WEB 服務器自己想取對象的哪部分。
　　例如：Range: bytes=1173546-
20. Referer：瀏覽器向 WEB 服務器表明自己是從哪個 網頁/URL 獲得/點擊 當前請求中的網址/URL。
　　例如：Referer：http://www.sina.com/
21. Server: WEB 服務器表明自己是什么軟件及版本等信息。
　　例如：Server：Apache/2.0.61 (Unix)
22. User-Agent: 瀏覽器表明自己的身份（是哪種瀏覽器）。
　　例如：User-Agent：Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN;rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14
23. Transfer-Encoding: WEB 服務器表明自己對本響應消息體（不是消息體里面的對象）作了怎樣的編碼，比如是否分塊（chunked）。
　　例如：Transfer-Encoding: chunked
24. Vary: WEB服務器用該頭部的內容告訴 Cache 服務器，在什么條件下才能用本響應所返回的對象響應后續的請求。假如源WEB服務器在接到第一個請求消息時，其響應消息的頭部為：Content-Encoding: gzip; Vary: Content-Encoding 那么 Cache 服務器會分析后續請求消息的頭部，檢查其 Accept-Encoding，是否跟先前響應的 Vary 頭部值一致，即是否使用相同的內容編碼方法，這樣就可以防止 Cache 服務器用自己Cache 里面壓縮后的實體響應給不具備解壓能力的瀏覽器。
　　例如：Vary：Accept-Encoding
25. Via： 列出從客戶端到 OCS 或者相反方向的響應經過了哪些代理服務器，他們用什么協議（和版本）發送的請求。當客戶端請求到達第一個代理服務器時，該服務器會在自己發出的請求里面添加 Via 頭部，並填上自己的相關信息，當下一個代理服務器 收到第一個代理服務器的請求時，會在自己發出的請求里面復制前一個代理服務器的請求的Via頭部，並把自己的相關信息加到后面， 以此類推，當 OCS 收到最后一個代理服務器的請求時，檢查 Via 頭部，就知道該請求所經過的路由。
　　例如：Via：1.0 236-81.D07071953.sina.com.cn:80 (squid/2.6.STABLE13)
3)  空行解析
空行：最后一個請求頭標之后是一個空行，發送回車符和退行，通知服務器以下不再有頭標
'''

報文內容解析

HTTP 請求消息頭部實例：

'''
Host：rss.sina.com.cn
User-Agent：Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14
Accept：text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language：zh-cn,zh;q=0.5
Accept-Encoding：gzip,deflate
Accept-Charset：gb2312,utf-8;q=0.7,*;q=0.7
Keep-Alive：300
Connection：keep-alive
Cookie：userId=C5bYpXrimdmsiQmsBPnE1Vn8ZQmdWSm3WRlEB3vRwTnRtW <-- Cookie
If-Modified-Since：Sun, 01 Jun 2008 12:05:30 GMT
Cache-Control：max-age=0
'''

HTTP 請求消息頭部實例：

3、HTTP響應

　　HTTP的響應報文也由4部分（響應行+響應頭+空行+響應體）組成：

'''
1)  狀態行
狀態行：狀態行由三個標記組成：HTTP版本、響應代碼和響應描述。
HTTP版本：向客戶端指明其可理解的最高版本。
響應代碼：3位的數字代碼，指出請求的成功或失敗，如果失敗則指出原因。
響應描述：為響應代碼的可讀性解釋。
例如：HTTP/1.1 200 OK
HTTP響應碼：
1xx 消息，一般是告訴客戶端，請求已經收到了，正在處理，別急...
2xx 處理成功，一般表示：請求收悉、我明白你要的、請求已受理、已經處理完成等信息.
3xx 重定向到其它地方。它讓客戶端再發起一個請求以完成整個處理。
4xx 處理發生錯誤，責任在客戶端，如客戶端的請求一個不存在的資源，客戶端未被授權，禁止訪問等。
5xx 處理發生錯誤，責任在服務端，如服務端拋出異常，路由出錯，HTTP版本不支持等。
繼續 101 分組交換協 200 OK 201 被創建 202 被采納
非授權信息 204 無內容 205 重置內容 206 部分內容
多選項 301 永久地傳送 302 找到 303 參見其他
未改動 305 使用代理 307 暫時重定向 400 錯誤請求
未授權 402 要求付費 403 禁止 404 未找到
不允許的方法 406 不被采納 407 要求代理授權408 請求超時
沖突 410 過期的 411 要求的長度 412 前提不成立
請求實例太大 414 請求URI太大 415 不支持的媒體類型
無法滿足的請求范圍 417 失敗的預期 500 內部服務器錯誤
未被使用 502 網關錯誤 503 不可用的服務 504 網關超時

2)  響應頭標
響應頭標：像請求頭標一樣，它們指出服務器的功能，標識出響應數據的細節。

3)  空行
空行：最后一個響應頭標之后是一個空行，發送回車符和退行，表明服務器以下不再有頭標。

4)  響應數據
響應數據：HTML文檔和圖像等，也就是HTML本身。
'''

報文結構解析

HTTP 響應消息頭部實例：

'''
Status：OK - 200 <-- 響應狀態碼，表示 web 服務器處理的結果。
Date：Sun, 01 Jun 2008 12:35:47 GMT
Server：Apache/2.0.61 (Unix)
Last-Modified：Sun, 01 Jun 2008 12:35:30 GMT
Accept-Ranges：bytes
Content-Length：18616
Cache-Control：max-age=120
Expires：Sun, 01 Jun 2008 12:37:47 GMT
Content-Type：application/xml
Age：2
X-Cache：HIT from 236-41.D07071951.sina.com.cn <-- 反向代理服務器使用的 HTTP 頭部
Via：1.0 236-41.D07071951.sina.com.cn:80 (squid/2.6.STABLE13)
Connection：close
'''

HTTP 響應消息頭部實例：

4、Session和Cookies

在瀏覽一些網站，比如購物的時候，我們常常需要先登陸，登陸過后我們可以連續訪問網站，並且可以將我們需要的購買的東西加入購物車。但是有時候我們中途過了一段時間沒有操作就需要重新登陸。還有某些網站，打開網頁之后就已經登陸了。這些功能看起來來很神奇，其實都是Session和Cookie在發揮作用。

簡述

1、無狀態HTTP

'''
Http有個特點，即無狀態。什么叫無狀態呢。Http無狀態是指Http協議對事務處理沒有記憶能力，當我們向服務器發送請求后，服務器處理請求之后返回結果。這是一個獨立的過程，再次向服務器發出請求，服務器做出響應又是一次獨立的過程。不會有一條網線一直連着你的電腦和服務器來完成你的所有請求。因此，服務器並不知道收到的兩次請求是否來自同一個用戶。這種效果並不是我們想要的。為了保持前后的狀態，我們需要將前面所有請求中的數據再重傳一次，這是非常麻煩和浪費資源的。為了解決這個問題，用於保持HTTP連接狀態的Session和Cookies就出現了。
'''

無狀態HTTP

2、session與cookies

'''
session是指從我們打開一個網站開始至我們關閉瀏覽器一系列的請求過程。比如我們打開淘寶網站，淘寶網站的服務器就會為我們創建並保存一個會話對象，會話對象里有用戶的一些信息，比如我們登陸之后，會話中就保存着我們的賬號信息。會話有一定的生命周期，當我們長時間（超過會話有效期）沒有訪問該網站或者關閉瀏覽器，服務器就會刪掉該會話對象。

    cookies是指網站為了辨別用戶身份，進行會話跟蹤而儲存在本地終端的數據，cookies一般再電腦中的文件里以文本形式儲存。cookies其實是有鍵值對組成的
'''

session、cookies

3、會話維持

'''
當客戶端瀏覽器第一次請求服務器時，服務器會再response中設置一個Set-Cookies的字段，用來標記用戶的身份，客戶端瀏覽器會把cookies保存起來，cookies中保存的有Session的id信息。當客戶端瀏覽器再次請求該網站時，會把Cookies放在請求頭中一起提交給服務器，服務器檢查該Cookies即可找到對應的會話是什么，再通過判斷會話來辨認用戶的狀態。

當我們成功登陸網站時，網站會告訴客戶端應該設置哪些Cookies信息，以保持登陸狀態。如果客戶端瀏覽器傳給服務器的cookies無效或者會話過期，可能就會收到錯誤的響應或者跳轉到登陸頁面重新登陸。

cookie和session的共同之處在於：cookie和session都是用來跟蹤瀏覽器用戶身份的會話方式。
cookie和session的區別是：cookie數據保存在客戶端，session數據保存在服務器端。
cookie不是很安全，別人可以分析存放在本地的COOKIE並進行COOKIE欺騙,如果主要考慮到安全應當使用session，當然也沒有絕對的安全，只是相對cookie，session更加安全
session會在一定時間內保存在服務器上。當訪問增多，會比較占用你服務器的性能，如果主要考慮到減輕服務器性能方面，應當使用COOKIE
cookie和session各有優缺點，所以將登陸信息等重要信息存放為SESSION;其他信息如果需要保留，可以放在COOKIE中
'''

會話維持

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Python-第三方模塊requests快速入手安裝第三方模塊方法和requests 詳解：Python2中的urllib、urllib2與Python3中的urllib以及第三方模塊requests python解析.yml/.yaml文件--pyyaml模塊（第三方） python解析.xls/.xlsx文件--openpyxl模塊（第三方） python解析.xml文件-- xmltodict模塊（第三方） python第三方庫requests詳解 python第三方庫requests詳解如何在cmd中安裝python第三方模塊 python中的第三方日志模塊logging