有時候我們需要清洗數據,里面有超鏈接,怎么去掉他們,比如下面的問題
<div class="lot-page-details"><ul class="info-list"><li class="lot-info-item"><p><strong class="section-header">Provenance</strong></p><p>Brand New
Gallery, Milan<br/>Acquired from the above by the present owner</p></li><li class="lot-info-item"><p><strong class="section-header">Exhibited</strong>
</p><p>Milan, Brand New Gallery, <em>This is the story of America. Everybody's doing what they think they're supposed to do</em>, November 21, 2013
- January 11, 2014</p></li><li class="artist-biography"><p><strong class="section-header">Artist Bio
</strong></p><a href="/artist/12106/ethan-cook"><h4>Ethan Cook</h4></a><p class="artist-info">American • 1983
</p><div class="follow-artist" data-artist-id="12106"
role="button"
tabindex="0">
<span cl
ass="icon"></
span><s
pan class=
"toolti
p">Follow</span></div><div class="artist-bio"><p>
<p>New York-based artist Ethan Cook is known for his abstract paintings on self-produced canvases. More recently, he has used handwoven strips of
cotton and linen to create painterly compositions. Cook's woven canvases are contemporary in their minimalist focus on shape and color while referencing
one of the most traditional art forms, weaving. Cook weaves his own canvases on a
loom and juxtaposes these with
store-bought canvas sheets
in abstract arrangements.
For the artist,
the surface of th
e canvas itself becomes the foc
us of his practice. Using simple geometric shapes and a l
imited color palate, Cook's works nurture structural s
implicity.</p></p><a href="/artist/12106/ethan-cook"><div class="lot-essay-button artist"><em>View More Works</em></div></a></div></li></ul></div>
第一種方法:
用這則替換,把 href 替換為 hre1f 就可以了,
第二種方法:
result_div_list = re.findall('<(.*?)>',str(result_div))
if 'href' in str(result_div_list): for ii in result_div_list: if 'href' in ii: item_desc = str(result_div).replace(str(ii) ,'') else: item_desc = result_div
記錄下來,供以后學習參考
