Python的url解析庫--urlparse

本文轉載自查看原文 2017-03-22 21:27 16204 Python基礎

一、urlparse解析url的query並構建字典

下面的方法主要的功能：

解析url的各個部分，並能夠獲取url的query部分，並把query部分構建成dict。

具體的代碼實現：

>>> import urlparse
>>> url = "http://www.example.org/default.html?ct=32&op=92&item=98"
>>> urlparse.urlsplit(url)
SplitResult(scheme='http', netloc='www.example.org', path='/default.html', query='ct=32&op=92&item=98', fragment='')
>>> urlparse.parse_qs(urlparse.urlsplit(url).query)
{'item': ['98'], 'op': ['92'], 'ct': ['32']}
>>> dict(urlparse.parse_qsl(urlparse.urlsplit(url).query))
{'item': '98', 'op': '92', 'ct': '32'}
>>>

注意：

在Python3中， urlparse已經被移動到urllib.parse中。

在urlparse中有兩個函數：urlparse.parse_qs()和urlparse.parse_qsl()。這兩個函數都能解析url中的query字段。如果url的query中有同一個key對應多個value，其中urlparse.parse_qs()可以把該相同key的value放在一個list中。

有時間測試一下，如果url的query中有同一個key對應多個value，那么服務端要怎樣接收。

import urlparse    
url=urlparse.urlparse('http://www.baidu.com/index.php?username=guol')
>>> print url
    ParseResult(scheme='http', netloc='www.baidu.com', path='/index.php', params='', query='username=guol', fragment='')
>>> print url.netloc
    www.baidu.com

二、url解碼

有時url會進行編碼，例如搜索的中文關鍵詞會進行簡單的編碼，具體的解碼方法：

>>> import urlparse
>>> from urlparse import unquote
>>> url = "http://www.google.com/support/contact/bin/request.py?entity=%7B%22author%22:%22AIe9_BEW4fia2hKVVTrlUwNzhLS-jMdh3isj0rMd7_Cw85R1-YlRNFkUwoDyhH4aMj7AdHsW5A1po8BinbxspAuLBdB-or_3YzCMNXZKYrb50MIIJCZEpb4%22,%22groups%22:%5B%22general%22,%2254296%7C700726330%22%5D,%22trustedMerchantId%22:%22MID_54316%22%7D&amp;client=242&amp;contact_type=anno&amp;hl=en_US"
>>> a = urlparse.urlparse(url).query
>>> b = unquote(a)
>>> b
'entity={"author":"AIe9_BEW4fia2hKVVTrlUwNzhLS-jMdh3isj0rMd7_Cw85R1-YlRNFkUwoDyhH4aMj7AdHsW5A1po8BinbxspAuLBdB-or_3YzCMNXZKYrb50MIIJCZEpb4","groups":["general","54296|700726330"],"trustedMerchantId":"MID_54316"}&amp;client=242&amp;contact_type=anno&amp;hl=en_US'
>>> import HTMLParser
>>> html_parser = HTMLParser.HTMLParser()
>>> txt = html_parser.unescape(b)
>>> txt
u'entity={"author":"AIe9_BEW4fia2hKVVTrlUwNzhLS-jMdh3isj0rMd7_Cw85R1-YlRNFkUwoDyhH4aMj7AdHsW5A1po8BinbxspAuLBdB-or_3YzCMNXZKYrb50MIIJCZEpb4","groups":["general","54296|700726330"],"trustedMerchantId":"MID_54316"}&client=242&contact_type=anno&hl=en_US'
>>> c = urlparse.parse_qsl(txt, True)
>>> c   # c是一個list
[(u'entity', u'{"author":"AIe9_BEW4fia2hKVVTrlUwNzhLS-jMdh3isj0rMd7_Cw85R1-YlRNFkUwoDyhH4aMj7AdHsW5A1po8BinbxspAuLBdB-or_3YzCMNXZKYrb50MIIJCZEpb4","groups":["general","54296|700726330"],"trustedMerchantId":"MID_54316"}'), (u'client', u'242'), (u'contact_type', u'anno'), (u'hl', u'en_US')]
>>> import json
>>> c = dict(c)
>>> d = json.loads(c['entity'])
>>> d
{u'trustedMerchantId': u'MID_54316', u'groups': [u'general', u'54296|700726330'], u'author': u'AIe9_BEW4fia2hKVVTrlUwNzhLS-jMdh3isj0rMd7_Cw85R1-YlRNFkUwoDyhH4aMj7AdHsW5A1po8BinbxspAuLBdB-or_3YzCMNXZKYrb50MIIJCZEpb4'}
>>> print d['groups'][-1]
54296|700726330
>>>

注意：

使用urlparse.unquote把編碼的url解碼。

使用HTMLParser對url的特殊符號進行解碼。

把元組組成的list轉換成dict，每個元組的第一個元素為dict的key，第二個元素為dict的value。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python——urlparse：解析url python url解析 Python 之lxml解析庫 Python3 url解碼與參數解析 Url解析 url解析 Python3 BeautifulSoup和Pyquery解析庫隨筆 Python解析庫lxml與xpath用法總結 python爬蟲中XPath和lxml解析庫 python解析庫lxml的簡單使用