網上找了很多文章,都去不掉script,應該是正則有問題。本人正則不行,最后還是使用beautifulsoup。
from bs4 import BeautifulSoup #html是獲取的html源碼 soup = BeautifulSoup(html,"lxml") [script.extract() for script in soup.findAll('script')] [style.extract() for style in soup.findAll('style')] print(soup.get_text())