Python批量刪除字符串中兩個字符中間值


之前我在爬取豆瓣電影,遇到一些問題,比如

<span class="actor"><span class='pl'>主演</span>: <span class='attrs'><a href="/subject_search?search_text=Arash%20Marandi" rel="v:starring">Arash Marandi

</a> / <a href="/subject_search?search_text=Flor%20Eduarda%20Gurrola" rel="v:starring">Flor Eduarda Gurrola</a> / <a href="/celebrity/1352068/" rel="v:
starring">路易斯·阿伯提</a> / <a href="/celebrity/1291820/" rel="v:starring">埃利希奧·梅蘭德斯</a> / <a href="/subject_search?search_text=Eduardo%20Mendiz%
C3%A1bal
" rel="v:starring">Eduardo Mendizábal</a> / <a href="/subject_search?search_text=Edwarda%20Gurrola" rel="v:starring">Edwarda Gurrola</a> /

<a href="/subject_search?search_text=Uriel%20Ledesma" rel="v:starring">Uriel Ledesma</a> / <a href="/subject_search?search_text=Ishbel%20Mata" rel="

v:starring">Ishbel Mata</a></span></span><br/>

  我想爬取下來里面的所有主演(動態數量),不能固定的直接用正則爬取,但是可以直接

     items = re.findall('<span class="actor"><span class=.*?>主演</span>: <span class=.*?><a href=".*?" rel="v:starring">(.*?)</a></span></span><br/>',page,re.S)

  爬取下來之后,直接

items1=items[0][0]
     print(re.sub('</a>.*?>', '/', items1))

  結果如下:

Arash Marandi/Flor Eduarda Gurrola/路易斯·阿伯提/埃利希奧·梅蘭德斯/Eduardo Mendizábal/Edwarda Gurrola/Uriel Ledesma/Ishbel Mata

 

在這之前,我自已定義函數:

def delete_word(code):
    temp = re.findall('</a>(.*?)">',code, re.S)
    return temp

  

 # for i in range(len((delete_word(items[0][2])))):
     #     print(items1.replace(delete_word(items[0][2])[i],""))
     # print(items1)
         # print("導演:"+items[0][1].replace(str(delete_word(items[0][1])),"").replace("</a>\">","/"))
         # print("編劇:"+items[0][2].replace(str(delete_word(items[0][2])), "").replace("</a>\">", "/"))
         # print(len(items))

  反而不行,所以說在一條路走不通的時候,換條路走。


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM