werkzeug源碼閱讀筆記(二) 上

本文轉載自查看原文 2015-06-30 22:32 3533 python/ werkzeug/ web/ wsgi

因為第一部分是關於初始化的部分的，我就沒有發布出來~

`wsgi.py`————第一部分

在分析這個模塊之前, 需要了解一下WSGI, 大致了解了之后再繼續~

`get_current_url()`函數

很明顯，該函數的作用是獲取當前url地址。代碼如下：

def get_current_url(environ, root_only=False, strip_querystring=False,
                    host_only=False, trusted_hosts=None):
    """  
    :param environ: the WSGI environment to get the current URL from.
    :param root_only: set `True` if you only want the root URL.
    :param strip_querystring: set to `True` if you don't want the querystring.
    :param host_only: set to `True` if the host URL should be returned.
    :param trusted_hosts: a list of trusted hosts, see :func:`host_is_trusted`
                          for more information.
    """
    tmp = [environ['wsgi.url_scheme'], '://', get_host(environ, trusted_hosts)]
    cat = tmp.append
    if host_only:
        return uri_to_iri(''.join(tmp) + '/')
    #這里, temp將變成root_only的地址
    cat(url_quote(wsgi_get_bytes(environ.get('SCRIPT_NAME', ''))).rstrip('/'))
    cat('/')
    if not root_only:
        cat(url_quote(wsgi_get_bytes(environ.get('PATH_INFO', '')).lstrip(b'/')))
        if not strip_querystring:
            qs = get_query_string(environ)
            if qs:
                cat('?' + qs)
    return uri_to_iri(''.join(tmp))

注意11~12行, 最開始那個append我也沒懂, 網上也找不到, 於是我試了下:

>>> temp = [1,2,3]
>>> temp
[1, 2, 3]
>>> aa = temp.append
>>> aa(2)
>>> temp
[1, 2, 3, 2]

很明顯, 當aa = temp.append之后，aa變成了一個函數, aa(1)等效於temp.append(1)
參數host_only的意思是只取host地址，比如http://www.baidu.com/xxx,其host地址就是http://www.baidu.com
函數最后return uri_to_iri, 是把該URI地址轉換成IRI(IRI包含unicode字符，URI是ASCII字符編碼)

`get_query_string()`函數

在wsgi.py中, 有很多類似的函數, 用來獲得對應的url字段, 這里我拿出一個來分析, 其他的都大同小異

def get_query_string(environ):
    qs = wsgi_get_bytes(environ.get('QUERY_STRING', ''))
    # QUERY_STRING really should be ascii safe but some browsers
    # will send us some unicode stuff (I am looking at you IE).
    # In that case we want to urllib quote it badly.
    #上面那句我查閱了urllib.parse.quote()方法，意思好像是把部分敏感詞匯使用%xx來隱藏, `safe`參數中的部分使用ascii編碼，不用隱藏
    return try_coerce_native(url_quote(qs, safe=':&%=+$!*\'(),'))

get_query_string(environ) 該函數的作用是把environ變量轉換成latin-1編碼(程序段中注釋說ascii編碼較安全, 但很多瀏覽器發送的是unicode編碼的字串, 所以需要統一編碼, latin-1向下兼容ascii)
接下來, 在返回值中我們可以看到url_quote函數, 查詢源碼：

def url_quote(string, charset='utf-8', errors='strict', safe='/:', unsafe=''):
    """URL encode a single string with a given encoding."""

    if not isinstance(string, (text_type, bytes, bytearray)):
        string = text_type(string)
    if isinstance(string, text_type):
        string = string.encode(charset, errors)
    if isinstance(safe, text_type):
        safe = safe.encode(charset, errors)
    if isinstance(unsafe, text_type):
        unsafe = unsafe.encode(charset, errors)
    safe = frozenset(bytearray(safe) + _always_safe) - frozenset(bytearray(unsafe))	#去除unsafe的部分，並轉換成bytearray
    rv = bytearray()
    for char in bytearray(string):
        if char in safe:
            rv.append(char)
        else:
            rv.extend(('%%%02X' % char).encode('ascii'))
    return to_native(bytes(rv))

從代碼中我們可以知道：傳入的string和safe和unsafe參數將被轉換成類型為string, 編碼方式為charset的數據, 其中charset默認為utf-8, 可以自己指定。最后再把string轉換成bytearray, 按規則輸出
try_coerce_native 在源碼中是try_coerce_native=_identity, _identity=lambda x: x，綜合起來try_coerce_native(a) = a

在本代碼段中，還有個很重要的東西：`bytearray()`

查閱文檔，bytearray(source, encoding, errors) 一共有三個參數，第一個自然是需要轉換的內容，第二個是編碼方式
為了理解bytearray, 我寫了如下的代碼：

>>> string = 'aaaa'
>>> temp = bytearray(string)
Traceback (most recent call last):
  File "<pyshell#50>", line 1, in <module>
    temp = bytearray(string)
TypeError: string argument without an encoding

提示告訴我，需要增加編碼方式，於是進行改進：

>>> string = 'aaaa'.encode('utf-8')
>>> temp = bytearray(string)
>>> print(temp)
bytearray(b'aaaa') 			#注意這個'b'

成功了，然后我又做了如下操作：

>>> for i in temp:
	print(i, end=' ')

	
97 97 97 97

這個和預想的有點不一樣啊，為什么不是輸出4個a呢？

原來，我們把string編碼成utf-8之后，放入了bytearray()中, temp自然也是utf-8編碼的，當輸出的時候，自然輸出的是utf-8的內容了  
同時，本例還說說明了bytearray()的對象是可迭代的

這樣，我們就能明白`url_quote()`函數的意義了：

在函數中，先把string和safe和unsafe轉成utf-8編碼，然后都轉成可迭代的bytearray(), 逐位比對string中是否含有safe中的字符，如果有，則不轉換，直接輸出; 如果沒有，則執行rv.extend(('%%%02X' % char).encode('ascii'))，從而完成了url地址中query_string部分的轉化(專業要求見get_query_string函數中的備注)
('%%%02X' % char): 前兩個%%輸出一個%, 后面%02X和C語言中一樣: 輸出2位十進制整數,不足2位的在前面補零

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 nonebot 源碼閱讀筆記 mmdetection源碼閱讀筆記 HSF源碼閱讀筆記（二） JDK源碼閱讀(1)：Object類閱讀筆記 Spring事務源碼閱讀筆記 Three.js源碼閱讀筆記-5 ClickHouse源碼閱讀筆記（一）之主要流程 Vue2.0源碼閱讀筆記（四）：nextTick linux源碼閱讀筆記 asm函數 nsq源碼閱讀筆記之nsqd（四）——Channel