urlparse模塊中為操作URL字符串提供了3種方法:
urlparse()
urlunparse()
urljoin()
1.urlparse()方法主要將URL字符串拆分成一個6元素元組
>>> from urllib.parse import urlparse >>> url = "http://www.cnblogs.com/thunderLL/p/6643022.html?pid='8766352'" >>> url_turple = urlparse(url) >>> for i,each in enumerate(url_turple): print(i,each) 0 http 1 www.cnblogs.com 2 /thunderLL/p/6643022.html 3 4 pid='8766352' 5 >>>
2.urlunparse()方法主要將URL的6元素元組變成url路徑;與urlparse方法作用相反
>>> from urllib.parse import urlunparse
>>> path = urlunparse(('http','www.cnblogs.com','/thunderLL/p/6643022.html','','pid=8766352',''))
>>> path
'http://www.cnblogs.com/thunderLL/p/6643022.html?pid=8766352'
3.urljoin()方法
>>> from urllib.parse import urljoin >>> url1 = urljoin('http://www.baidu.com/admin/','module-urllib2/request-objects.html') >>> url1 'http://www.baidu.com/admin/module-urllib2/request-objects.html' >>> url2 = urljoin('http://www.baidu.com/admin','module-urllib2/request-objects.html') >>> url2 'http://www.baidu.com/module-urllib2/request-objects.html'
urljoin()方法拼接兩個URL(基地址和相對地址)得到的地址url1和url2,這兩個URL區別在於基地址后面有無/,導致的運行結果存在差異
URL基地址后面沒有/則該處會被替換掉