20181206（re，正則表達式，哈希）

本文轉載自查看原文 2018-12-06 19:56 667

1、re&正則表達式

2、hashlib

一：re模塊&正則表達式

正則：正則就是用一些具有特殊含義的符號組合到一起（稱為正則表達式）來描述字符或者字符串的方法。或者說：正則就是用來描述一類事物的規則。（在Python中）它內嵌在Python中，並通過 re 模塊實現。正則表達式得到的是一個列表。

import re
\w 匹配字母數字下划線：
res = re.findall('alex','hahhaha alex is alex is dsb')  #逐字匹配，一旦匹配成功就會完整跳過匹配上的字符串。例如在abcabcabca中，能夠尋找到的abca為括號中的內容（abca）bc（abca）
print(res)
輸出結果:
['alex', 'alex']
print(re.findall('\w','Aa12 _+-'))
輸出結果:
['A', 'a', '1', '2', '_']
print(re.findall('\w\w','Aa12 _+-'))   #得到兩個連續的列表\數字\下划線
輸出結果：
['Aa', '12']
print(re.findall('\w9\w','Aa912 s9_+-'))  #得到三個連續的列表\數字\下划線，其中中間的字符必須是9
輸出結果：
['a91', 's9_']

print(re.findall('^alex','alex id alex'))   #^表示近從頭開始匹配
print(re.findall('^alex',' alex id alex'))  #此處開頭有空格
輸出結果：
['alex']
[]


print(re.findall('alex$',' alex id alex'))  #$表示從文件末尾開始尋找，末尾沒有就返回空列表
輸出結果：
['alex']


. 代表一個字符，該字符可以是除換行符之外的任意字符
print(re.findall('a.c','a a1c aaac a c jkdsajfkd')) #得到三個字符，首尾是ac，中間是除換行符之外的任意字符
輸出結果['a1c', 'aac', 'a c']

print(re.findall('a.c','a a1c aaac a c jkda\ncd')) #\n表示換行符
輸出結果為：
['a1c', 'aac', 'a c']
print(re.findall('a.c','a a1c aaac a c jkda\ncd',re.DOTALL)) #re.DOTALL表示.可以代表所有的字符，包括換行符（\n當成一個字符來處理）
輸出結果為：
['a1c', 'aac', 'a c', 'a\nc']

[]代表匹配一個字符，這一個字符可以自定義范圍。# [a,]表示a或者，！！！
print(re.findall('a[0-9]c','a a1d a2ca aaa a\nc',re.DOTALL))  #獲得一個首尾兩端為ac且中間為0~9中的一個數字的字符串列表。[0-9]中的橫杠表示從哪到哪的意思，表示的是一個范圍。
輸出結果為：
['a2c']
print(re.findall('a[a-zA-Z]c','a aac a2ca aAc a\nc',re.DOTALL)) #獲得一個首尾為ac的，中間是從a到z或從A到Z的字符串列表
輸出結果為：
['aac', 'aAc']
print(re.findall('a[ac]c','a aac a2ca aAc a\nc',re.DOTALL)) #獲得一個首尾為ac，中間為a或者c的字符串列表
輸出結果為：
['aac']

如下例子會報錯！因為當橫杠在中間時表示的是范圍！
print(re.findall('a[+-#/]c','a+c a-c a/c a*c a3c a\nc',re.DOTALL))
解決方法是改為[-+#/][+#/-][+\-#/] 將橫杠改為中括號內的首尾，或者加上轉義符反斜杠
print(re.findall('a[+\-#/]c','a+c a-c a/c a*c a3c a\nc',re.DOTALL))
修改后的輸出結果為：
['a+c', 'a-c', 'a/c']

        
重復匹配： ？*+｛n,m｝均不能單獨使用，左側必須有字符。   
        
？代表左邊那一個字符出現0次到1次
print(re.findall('ab?','a ab abb abbb a123b a123bbb'))
輸出結果：
['a', 'ab', 'ab', 'ab', 'a', 'a']

*代表左邊那一個字符出現0次到無窮次
print(re.findall('ab*','a ab abb abbb a123b a123bbb'))
輸出結果:
['a', 'ab', 'abb', 'abbb', 'a', 'a']

+代表左邊的那一個字符出現1次到無窮次
print(re.findall('ab+','a ab abb abbb a123b a123bbb'))
輸出結果：
['ab', 'abb', 'abbb']

｛n,m｝代表左邊那一個字符出現n次到m次
print(re.findall('ab{1,3}','a ab abb abbb a123b a123bbb'))
print(re.findall('ab{1，}','a ab abb abbb a123b a123bbb')) #等同ab+
print(re.findall('ab{0,}','a ab abb abbb a123b a123bbb'))  #等同ab*
print(re.findall('ab{3}','a ab abb abbb a123b a123bbb'))  #指定出現3次
輸出結果為：
['ab', 'abb', 'abbb']
['ab', 'abb', 'abbb']
['a', 'ab', 'abb', 'abbb', 'a', 'a']
['abbb']

[^]取反：
print(re.findall('a[^0-9]c','a a1d a2ca aac a\nc',re.DOTALL))  #表示獲得一個長度為3，首尾是ac，中間不是數字的字符串列表
輸出結果為：
['aac', 'a\nc']

.* 匹配任意0到無窮個字符，貪婪匹配，不常用。
print(re.findall('a.*c','a345ca34tcsf'))
輸出結果：
['a345ca34tc']  #取到最遠的c

.*？匹配任意0個到無窮個字符，非貪婪匹配，常用。
print(re.findall('a.*?c','a31c34atcsf'))  #表示碰到首尾是ac的就取出，就近原則。可能會輸出過個結果。
輸出結果為：
['a31c', 'atc']

|代表或者，連接兩個正則表達式。
print(re.findall('company|companies','Too many companies have gone bankrupt, and the next one is my company')) #尋找company或companies
輸出結果為：
['companies', 'company']        
        

()分組
print(re.findall('compan(y|ies)','Too many companies have gone bankrupt, and the next one is my company'))  #匹配成功后只輸出括號內的字符
輸出結果為：        
['ies', 'y']        
        
print(re.findall('compan(?:y|ies)','Too many companies have gone bankrupt, and the next one is my company')) #findall的結果不是匹配的全部內容，而是組內的內容。?:可以讓結果為匹配的全部內容。
輸出結果：
['companies', 'company']

小綜合應用，尋找視頻鏈接地址：

print(re.findall('href="(.*?)"','<p>動感視頻</p><a href="https://www.douniwan.com/1.mp4">逗你玩呢</a><a href="https://www.xxx.com/2.mp4">葫蘆娃</a>'))
輸出結果為：
['https://www.douniwan.com/1.mp4', 'https://www.xxx.com/2.mp4']

'href="(.*?)"' 查找：href="(.*?)"類似的內容，匹配成功后只輸出括號內的內容。

\轉義相關：

python本身沒有正則表達式，是識別語法后交給cpython解釋器去執行正則表達式。

print(re.findall('a\\\\c','a\c aac'))  #\是轉義符，第一個轉義第二個，第三個轉移第四個，交給解釋器為'a\\c'，解釋器語法識別轉義，第一個轉義第二個，就是'a\c'了
print(re.findall(r'a\\c','a\c aac'))  #r原生輸入，解釋器拿到的就是a\\c了
輸出結果為：
['a\\c']  #沒辦法，只能這么顯示，第一個轉義第二個\，實際應理解為'a\c'
['a\\c']

re.I 忽略大小寫

print(re.findall('alex','my name is alex Alex is sb alex Alex',re.I))
輸出結果：
['alex', 'Alex', 'alex', 'Alex']

re.M

msg="""
my name is egon
sfaf egon
sdf23 egon
"""
print(re.findall('egon$',msg,))  #只從后面找第一個，msg內容看作是一行的內容 
輸出結果為：
['egon']


msg="""
my name is egon
sfaf egon
sdf23 egon
"""
print(re.findall('egon$',msg,re.M))  #識別換行符，找到每一個line最后的字符
輸出結果為：
['egon', 'egon', 'egon']

re模塊的其他方法：

re.search方法：找到一次就不找了，如果找不到就返回None。

print(re.search('href="(.*?)"','<p>動感視頻</p><a href="https://www.douniwan.com/1.mp4">逗你玩呢</a><a href="https://www.xxx.com/2.mp4">葫蘆娃</a>'))
輸出結果為：
<_sre.SRE_Match object; span=(14, 51), match='href="https://www.douniwan.com/1.mp4"'>  #match內容是完整內容，而不是組內內容！
如果沒找到，輸出結果為:
None

res=re.findall('(href)="(.*?)"','<p>動感視頻</p><a href="https://www.douniwan.com/1.mp4">逗你玩呢</a><a href="https://www.xxx.com/2.mp4">葫蘆娃</a>')
print(res)
輸出結果：  #組內內容全部輸出，無法指定。
[('href', 'https://www.douniwan.com/1.mp4'), ('href', 'https://www.xxx.com/2.mp4')]


re.serch.group()功能可以指定輸出組內的內容。
res=re.search('(href)="(.*?)"','<p>動感視頻</p><a href="https://www.douniwan.com/1.mp4">逗你玩呢</a><a href="https://www.xxx.com/2.mp4">葫蘆娃</a>')   #注意匹配項里的兩個元組
print(res)
print(res.group(0))  #不傳參或者參數為0表示匹配內容完整輸出
print(res.group(1))  #輸出匹配內容的第一處元組內容
print(res.group(2))  #輸出匹配內容的第二處元組內容，如果傳的參數大於實際元組數 ，則會報錯
輸出結果為：
href="https://www.douniwan.com/1.mp4"
href
https://www.douniwan.com/1.mp4

re.match()等同於re.search(^)

print(re.findall('alex','alex is alex is alex'))
print(re.search('alex','is alex is alex'))
print(re.search('^alex','is alex is alex'))
print(re.match('alex','alex is alex is alex'))
輸出結果為：
['alex', 'alex', 'alex']  #findall查找所有
<_sre.SRE_Match object; span=(3, 7), match='alex'>  #查找到一個就結束
None   #只從頭開始找，如果首字符沒有匹配就返回None
<_sre.SRE_Match object; span=(0, 4), match='alex'>  #只從頭開始找，如果首字符沒有匹配就返回None，等同於re.search(^)

re.compile方法
pattern=re.compile('alex')  #目標字符
print(pattern.findall('alex is alex is alex'))
print(pattern.search('alex is alex is alex'))
print(pattern.match('alex is alex is alex'))
輸出結果為：
['alex', 'alex', 'alex']
<_sre.SRE_Match object; span=(0, 4), match='alex'>
<_sre.SRE_Match object; span=(0, 4), match='alex'>

練習

計算器作業

msg="1-2*(60+(-40.35/5)-(-4*3))"
print(re.findall('\D?(\-?\d+\.?\d*)',msg))
輸出結果：
['1', '2', '60', '-40.35', '5', '-4', '3']

二、hashlib模塊

hashlib下面有許多算法，hash本身也是一種算法。

用途：對數據進行校驗
hash：是一種算法，該算法接收一系列的數據，運算得到返回值
返回值是hash值
三大特性：
"""
1、只要傳入的內容一樣，那么得到的hash值就一定是一樣的
2、只要采用hash算法固定，無論傳入的內容多大，hash值的長度是固定的
3、hash值不可逆，即不可能通過hash值逆推出內容

1+2=》文件完整性校驗
3==》加密傳輸的數據
"""

import hashlib

m=hashlib.md5()   #此處采用md5哈希算法
m.update('你好'.encode('utf-8'))  #按照utf8格式編碼。
m.update('hello'.encode('utf-8'))
print(m.hexdigest())
輸出結果為：
65c83c71cb3b2e2882f99358430679c3

也可以寫成：
m=hashlib.md5()   #此處采用md5哈希算法
m.update('你好hello'.encode('utf-8'))  #按照utf8格式編碼。
print(m.hexdigest())
輸出結果為：
65c83c71cb3b2e2882f99358430679c3  #哈希值與上例中相同。



m1=hashlib.md5(b'hello') #也可以直接輸入英文，前面用b表示是Bytes類型。
print(m1.hexdigest())

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 正則表達式re模塊 re模塊（正則表達式） python re正則表達式 re(正則表達式)模塊 python re：正則表達式中使用變量 re庫、正則表達式基本使用 Python庫-re(正則表達式) python中的正則表達式（re模塊） python re正則表達式實例 python re模塊 - 正則表達式