正則表達式去除html中的標簽


正則表達式去除html中的標簽

目的

題目的目的,換言之就是,用正則表達式提取html標簽中的文字內容。

現有一份html文檔的源碼,是一份postdoc招聘信息,想通過正則表達式提取出其中關於招聘的信息。
首先,定位到了招聘信息內容所處的標簽div,內容如下(其實語句是 “Postdoctoral Scholar in Translational Bioinformatics, Computational Biology for Precision Cancer Medicine”):

<div class="rich_media_content " id="js_content" style="visibility: hidden;">

                    

                    

                    
                    
                    <p style="line-height: 1.75em;letter-spacing: 0.54px;font-family: -apple-system-font, BlinkMacSystemFont, &quot;Helvetica Neue&quot;, &quot;PingFang SC&quot;, &quot;Hiragino Sans GB&quot;, &quot;Microsoft YaHei UI&quot;, &quot;Microsoft YaHei&quot;, Arial, sans-serif;margin-right: 8px;margin-left: 8px;white-space: normal;min-height: 1em;max-width: 100%;box-sizing: border-box !important;background-color: rgb(255, 255, 255);overflow-wrap: break-word;"><span style="font-size: 15px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;">Postdoctoral Scholar in Translational Bioinformatics, Computational Biology for Precision Cancer Medicine</span></p><p style="line-height: 1.75em;letter-spacing: 0.54px;font-family: -apple-system-font, BlinkMacSystemFont, &quot;Helvetica Neue&quot;, &quot;PingFang SC&quot;, &quot;Hiragino Sans GB&quot;, &quot;Microsoft YaHei UI&quot;, &quot;Microsoft YaHei&quot;, Arial, sans-serif;margin-right: 8px;margin-left: 8px;white-space: normal;min-height: 1em;max-width: 100%;box-sizing: border-box !important;background-color: rgb(255, 255, 255);overflow-wrap: break-word;"><br  /></p><p style="line-height: 1.75em;letter-spacing: 0.54px;font-family: -apple-system-font, BlinkMacSystemFont, &quot;Helvetica Neue&quot;, &quot;PingFang SC&quot;, &quot;Hiragino Sans GB&quot;, &quot;Microsoft YaHei UI&quot;, &quot;Microsoft YaHei&quot;, Arial, sans-serif;margin-right: 8px;margin-left: 8px;white-space: normal;min-height: 1em;max-width: 100%;box-sizing: border-box !important;background-color: rgb(255, 255, 255);overflow-wrap: break-word;"><span style="font-size: 15px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;">The&nbsp;</span><span style="color: rgb(171, 25, 66);max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;"><strong style="max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;"><span style="font-size: 15px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;">Ruijiang Li&nbsp;</span></strong></span><span style="font-size: 15px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;">lab at Stanford University School of Medicine is looking for a highly motivated postdoctoral scholar. The major focus of the lab is to&nbsp;develop, validate, and clinically translate diagnostic, prognostic, predictive biomarkers for precision cancer medicine. We integrate datasets of large patient populations and develop novel statistical and machine learning methods.</span></p><p style="line-height: 1.75em;letter-spacing: 0.54px;font-family: -apple-system-font, BlinkMacSystemFont, &quot;Helvetica Neue&quot;, &quot;PingFang SC&quot;, &quot;Hiragino Sans GB&quot;, &quot;Microsoft YaHei UI&quot;, &quot;Microsoft YaHei&quot;, Arial, sans-serif;margin-right: 8px;margin-left: 8px;white-space: normal;min-height: 1em;max-width: 100%;box-sizing: border-box !important;background-color: rgb(255, 255, 255);overflow-wrap: break-word;"><br  /></p><p style="line-height: 1.75em;letter-spacing: 0.54px;font-family: -apple-system-font, BlinkMacSystemFont, &quot;Helvetica Neue&quot;, &quot;PingFang SC&quot;, &quot;Hiragino Sans GB&quot;, &quot;Microsoft YaHei UI&quot;, &quot;Microsoft YaHei&quot;, Arial, sans-serif;margin-right: 8px;margin-left: 8px;white-space: normal;min-height: 1em;max-width: 100%;box-sizing: border-box !important;background-color: rgb(255, 255, 255);overflow-wrap: break-word;"><span style="font-size: 15px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;">Our lab is generously funded by 3 active NIH R01 grants. Our work has been published in top clinical journals such as JAMA Oncology, Annals of Surgery, Clinical Cancer&nbsp;Research, Radiology. Major awards to my postdoc trainees include the prestigious NIH K99/R00 Pathway to Independence Award, which provides with $1,000,000 over 5 years to establish an independent research program. The awardee has secured a tenure-track faculty position at MD Anderson Cancer Center. Please visit&nbsp;</span><span style="font-size: 15px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;"><strong style="max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;"><span style="color: rgb(61, 170, 214);max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;">http://med.stanford.edu/lilab</span></strong></span></p><p style="line-height: 1.75em;letter-spacing: 0.54px;font-family: -apple-system-font, BlinkMacSystemFont, &quot;Helvetica Neue&quot;, &quot;PingFang SC&quot;, &quot;Hiragino Sans GB&quot;, &quot;Microsoft YaHei UI&quot;, &quot;Microsoft YaHei&quot;, Arial, sans-serif;margin-right: 8px;margin-left: 8px;white-space: normal;min-height: 1em;max-width: 100%;box-sizing: border-box !important;background-color: rgb(255, 255, 255);overflow-wrap: break-word;"><br style="max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;"  /></p><p style="line-height: 1.75em;letter-spacing: 0.54px;font-family: -apple-system-font, BlinkMacSystemFont, &quot;Helvetica Neue&quot;, &quot;PingFang SC&quot;, &quot;Hiragino Sans GB&quot;, &quot;Microsoft YaHei UI&quot;, &quot;Microsoft YaHei&quot;, Arial, sans-serif;margin-right: 8px;margin-left: 8px;white-space: normal;min-height: 1em;max-width: 100%;box-sizing: border-box !important;background-color: rgb(255, 255, 255);overflow-wrap: break-word;"><span style="font-size: 15px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;">Candidates from a diverse background are encouraged to apply. The applicant may hold a PhD either in math, physical sciences or engineering with a strong motivation to solve biomedical problems, or in biomedical sciences with a strong interest to apply computational approaches. The ideal candidates will have strong analytic and computational skills, as well as prior research experience in cancer genomics, epigenomics, transcriptomics, or multi-omic data integration. Basic knowledge in molecular biology or tumor immunology is helpful.</span></p><p style="line-height: 1.75em;letter-spacing: 0.54px;font-family: -apple-system-font, BlinkMacSystemFont, &quot;Helvetica Neue&quot;, &quot;PingFang SC&quot;, &quot;Hiragino Sans GB&quot;, &quot;Microsoft YaHei UI&quot;, &quot;Microsoft YaHei&quot;, Arial, sans-serif;margin-right: 8px;margin-left: 8px;white-space: normal;min-height: 1em;max-width: 100%;box-sizing: border-box !important;background-color: rgb(255, 255, 255);overflow-wrap: break-word;"><br  /></p><p style="line-height: 1.75em;letter-spacing: 0.54px;font-family: -apple-system-font, BlinkMacSystemFont, &quot;Helvetica Neue&quot;, &quot;PingFang SC&quot;, &quot;Hiragino Sans GB&quot;, &quot;Microsoft YaHei UI&quot;, &quot;Microsoft YaHei&quot;, Arial, sans-serif;margin-right: 8px;margin-left: 8px;white-space: normal;min-height: 1em;max-width: 100%;box-sizing: border-box !important;background-color: rgb(255, 255, 255);overflow-wrap: break-word;"><span style="font-size: 15px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;">Stanford University is located at the heart of Silicon Valley, epicenter of the technology revolution in biomedicine. This is an excellent opportunity not only for those motivated to pursue an academic career, but also for those interested in entrepreneurship with the goal of commercialization and translation of new technology into clinical practice.</span></p><p style="line-height: 1.75em;letter-spacing: 0.54px;font-family: -apple-system-font, BlinkMacSystemFont, &quot;Helvetica Neue&quot;, &quot;PingFang SC&quot;, &quot;Hiragino Sans GB&quot;, &quot;Microsoft YaHei UI&quot;, &quot;Microsoft YaHei&quot;, Arial, sans-serif;margin-right: 8px;margin-left: 8px;white-space: normal;min-height: 1em;max-width: 100%;box-sizing: border-box !important;background-color: rgb(255, 255, 255);overflow-wrap: break-word;"><br  /></p><p style="line-height: 1.75em;letter-spacing: 0.54px;font-family: -apple-system-font, BlinkMacSystemFont, &quot;Helvetica Neue&quot;, &quot;PingFang SC&quot;, &quot;Hiragino Sans GB&quot;, &quot;Microsoft YaHei UI&quot;, &quot;Microsoft YaHei&quot;, Arial, sans-serif;margin-right: 8px;margin-left: 8px;white-space: normal;min-height: 1em;max-width: 100%;box-sizing: border-box !important;background-color: rgb(255, 255, 255);overflow-wrap: break-word;"><span style="font-size: 15px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;">Interested applicants should send a research statement, CV, and names of three references to:</span></p><p style="line-height: 1.75em;letter-spacing: 0.54px;font-family: -apple-system-font, BlinkMacSystemFont, &quot;Helvetica Neue&quot;, &quot;PingFang SC&quot;, &quot;Hiragino Sans GB&quot;, &quot;Microsoft YaHei UI&quot;, &quot;Microsoft YaHei&quot;, Arial, sans-serif;margin-right: 8px;margin-left: 8px;white-space: normal;min-height: 1em;max-width: 100%;box-sizing: border-box !important;background-color: rgb(255, 255, 255);overflow-wrap: break-word;"><span style="font-size: 15px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;">Ruijiang Li, PhD</span></p><p style="line-height: 1.75em;letter-spacing: 0.54px;font-family: -apple-system-font, BlinkMacSystemFont, &quot;Helvetica Neue&quot;, &quot;PingFang SC&quot;, &quot;Hiragino Sans GB&quot;, &quot;Microsoft YaHei UI&quot;, &quot;Microsoft YaHei&quot;, Arial, sans-serif;margin-right: 8px;margin-left: 8px;white-space: normal;min-height: 1em;max-width: 100%;box-sizing: border-box !important;background-color: rgb(255, 255, 255);overflow-wrap: break-word;"><br  /></p><p style="line-height: 1.75em;letter-spacing: 0.54px;font-family: -apple-system-font, BlinkMacSystemFont, &quot;Helvetica Neue&quot;, &quot;PingFang SC&quot;, &quot;Hiragino Sans GB&quot;, &quot;Microsoft YaHei UI&quot;, &quot;Microsoft YaHei&quot;, Arial, sans-serif;margin-right: 8px;margin-left: 8px;white-space: normal;min-height: 1em;max-width: 100%;box-sizing: border-box !important;background-color: rgb(255, 255, 255);overflow-wrap: break-word;"><span style="font-size: 15px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;">Email:</span><span style="color: rgb(217, 33, 66);max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;"><strong style="max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;"><span style="font-size: 15px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;">rli2@stanford.edu</span></strong></span></p><p style="line-height: 1.75em;letter-spacing: 0.54px;font-family: -apple-system-font, BlinkMacSystemFont, &quot;Helvetica Neue&quot;, &quot;PingFang SC&quot;, &quot;Hiragino Sans GB&quot;, &quot;Microsoft YaHei UI&quot;, &quot;Microsoft YaHei&quot;, Arial, sans-serif;margin-right: 8px;margin-left: 8px;white-space: normal;min-height: 1em;max-width: 100%;box-sizing: border-box !important;background-color: rgb(255, 255, 255);overflow-wrap: break-word;"><span style="color: rgb(217, 33, 66);max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;"><strong style="max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;"><span style="font-size: 15px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;"><br  /></span></strong></span></p><p style="text-indent: 0em;letter-spacing: 0.54px;font-family: -apple-system-font, BlinkMacSystemFont, &quot;Helvetica Neue&quot;, &quot;PingFang SC&quot;, &quot;Hiragino Sans GB&quot;, &quot;Microsoft YaHei UI&quot;, &quot;Microsoft YaHei&quot;, Arial, sans-serif;margin-right: 8px;margin-left: 8px;white-space: normal;min-height: 1em;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;"><span style="color: rgb(61, 170, 214);font-size: 15px;-ms-word-wrap: break-word !important;max-width: 100%;box-sizing: border-box !important;"><strong style="-ms-word-wrap: break-word !important;max-width: 100%;box-sizing: border-box !important;">請在應聘材料上注明此職位信息來源於BioArt。</strong></span></p><p style="letter-spacing: 0.54px;font-family: -apple-system-font, BlinkMacSystemFont, &quot;Helvetica Neue&quot;, &quot;PingFang SC&quot;, &quot;Hiragino Sans GB&quot;, &quot;Microsoft YaHei UI&quot;, &quot;Microsoft YaHei&quot;, Arial, sans-serif;margin-right: 8px;margin-left: 8px;white-space: normal;min-height: 1em;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;"><br style="-ms-word-wrap: break-word !important;max-width: 100%;box-sizing: border-box !important;"  /></p><p style="letter-spacing: 0.54px;font-family: -apple-system-font, BlinkMacSystemFont, &quot;Helvetica Neue&quot;, &quot;PingFang SC&quot;, &quot;Hiragino Sans GB&quot;, &quot;Microsoft YaHei UI&quot;, &quot;Microsoft YaHei&quot;, Arial, sans-serif;margin-right: 8px;margin-left: 8px;white-space: normal;min-height: 1em;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;"><strong style="-ms-word-wrap: break-word !important;max-width: 100%;box-sizing: border-box !important;"><em style="-ms-word-wrap: break-word !important;max-width: 100%;box-sizing: border-box !important;"><span style="font-size: 15px;-ms-word-wrap: break-word !important;max-width: 100%;box-sizing: border-box !important;">溫馨提示<em style="letter-spacing: 0.54px;font-family: -apple-system-font, system-ui, &quot;Helvetica Neue&quot;, &quot;PingFang SC&quot;, &quot;Hiragino Sans GB&quot;, &quot;Microsoft YaHei UI&quot;, &quot;Microsoft YaHei&quot;, Arial, sans-serif;-ms-word-wrap: break-word !important;max-width: 100%;box-sizing: border-box !important;"><span style="-ms-word-wrap: break-word !important;max-width: 100%;box-sizing: border-box !important;">:</span></em></span></em></strong><em style="-ms-word-wrap: break-word !important;max-width: 100%;box-sizing: border-box !important;"><span style="font-size: 15px;-ms-word-wrap: break-word !important;max-width: 100%;box-sizing: border-box !important;">BioArt原則上每年可為每個課題組免費發布一次博后招聘廣告,博后廣告請直接將word文檔發送到</span></em><strong style="-ms-word-wrap: break-word !important;max-width: 100%;box-sizing: border-box !important;"><span style="color: rgb(217, 33, 66);-ms-word-wrap: break-word !important;max-width: 100%;box-sizing: border-box !important;"><em style="-ms-word-wrap: break-word !important;max-width: 100%;box-sizing: border-box !important;"><span style="font-size: 15px;-ms-word-wrap: break-word !important;max-width: 100%;box-sizing: border-box !important;">sinobioart@bioart.com.cn</span></em></span></strong><em style="-ms-word-wrap: break-word !important;max-width: 100%;box-sizing: border-box !important;"><span style="font-size: 15px;-ms-word-wrap: break-word !important;max-width: 100%;box-sizing: border-box !important;">或加微信ID:</span></em><strong style="-ms-word-wrap: break-word !important;max-width: 100%;box-sizing: border-box !important;"><span style="color: rgb(217, 33, 66);-ms-word-wrap: break-word !important;max-width: 100%;box-sizing: border-box !important;"><em style="-ms-word-wrap: break-word !important;max-width: 100%;box-sizing: border-box !important;"><span style="font-size: 15px;-ms-word-wrap: break-word !important;max-width: 100%;box-sizing: border-box !important;">bioartbusiness&nbsp;</span></em></span></strong>。</p><p style="text-indent: 0em;letter-spacing: 0.54px;font-family: -apple-system-font, BlinkMacSystemFont, &quot;Helvetica Neue&quot;, &quot;PingFang SC&quot;, &quot;Hiragino Sans GB&quot;, &quot;Microsoft YaHei UI&quot;, &quot;Microsoft YaHei&quot;, Arial, sans-serif;white-space: normal;min-height: 1em;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;"><br style="-ms-word-wrap: break-word !important;max-width: 100%;box-sizing: border-box !important;"  /></p><section style="text-align: center;margin-right: 8px;margin-left: 8px;white-space: normal;"><img class="rich_pages" data-ratio="0.5244444444444445" data-type="png" data-w="900" data-s="300,640" data-src="https://mmbiz.qpic.cn/mmbiz_png/zO6xlS3tgcHuPkfM4BsYWV2yO5SfZ74tFljA68n6B6gLcsWWBkG1euFL5UvFSf2mcxhMfMHv4libLrzwJiatpADA/640?wx_fmt=png"  /></section>

方法

利用sublime,正則表達式。
因為標簽tag更具有規律性,所以通過正則表達式表示出所有的標簽tag,然后再invert selection,即選中所有招聘信息內容。
操作:將上述內容粘貼到sublime中,ctrl F, 點亮正則表達式選項,然后輸入<[^>]+>,點擊Find All, 然后Selection- Invert selection,即選中了所有招聘信息內容,復制粘貼到新文檔中即可。為方便查看,再對結果中的html空格占位符 &nbsp;進行替換為空格即可。
過濾出來的內容如下:

Postdoctoral Scholar in Translational Bioinformatics, Computational Biology for Precision Cancer Medicine
The 
Ruijiang Li 
lab at Stanford University School of Medicine is looking for a highly motivated postdoctoral scholar. The major focus of the lab is to develop, validate, and clinically translate diagnostic, prognostic, predictive biomarkers for precision cancer medicine. We integrate datasets of large patient populations and develop novel statistical and machine learning methods.
Our lab is generously funded by 3 active NIH R01 grants. Our work has been published in top clinical journals such as JAMA Oncology, Annals of Surgery, Clinical Cancer Research, Radiology. Major awards to my postdoc trainees include the prestigious NIH K99/R00 Pathway to Independence Award, which provides with $1,000,000 over 5 years to establish an independent research program. The awardee has secured a tenure-track faculty position at MD Anderson Cancer Center. Please visit 
http://med.stanford.edu/lilab
Candidates from a diverse background are encouraged to apply. The applicant may hold a PhD either in math, physical sciences or engineering with a strong motivation to solve biomedical problems, or in biomedical sciences with a strong interest to apply computational approaches. The ideal candidates will have strong analytic and computational skills, as well as prior research experience in cancer genomics, epigenomics, transcriptomics, or multi-omic data integration. Basic knowledge in molecular biology or tumor immunology is helpful.
Stanford University is located at the heart of Silicon Valley, epicenter of the technology revolution in biomedicine. This is an excellent opportunity not only for those motivated to pursue an academic career, but also for those interested in entrepreneurship with the goal of commercialization and translation of new technology into clinical practice.
Interested applicants should send a research statement, CV, and names of three references to:
Ruijiang Li, PhD
Email:
rli2@stanford.edu
請在應聘材料上注明此職位信息來源於BioArt。
溫馨提示
:
BioArt原則上每年可為每個課題組免費發布一次博后招聘廣告,博后廣告請直接將word文檔發送到
sinobioart@bioart.com.cn
或加微信ID:
bioartbusiness 
。


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM