面試題（三）

本文轉載自查看原文 2017-06-03 22:47 1511 Python面試與分析

請完成一個程序，並能按步驟實現以下功能：

1. 下載https://en.wikipedia.org/wiki/Machine_translation 頁面的內容並保存為mt.html

需要編寫代碼來下載頁面。

2. 統計mt.html中<p>標簽內下所有單詞以及數目並存儲到mt_word.txt中。

mt_word.txt有如下幾點要求：

a) 每個單詞一行。單詞在前，單詞出現的次數在后，中間用Tab(\t)進行分隔。

b) 單詞要按照單詞數目從多到少的順序進行排列。比如說單詞a出現了100次，單詞b出現了10次，則單詞a要在單詞b的前面。

3. 提取出mt.html中所有的年份信息（比如說頁面中的1629, 1951這些的四位數字就是年份）存儲到mt_year.txt中。

mt_year.txt有如下幾點要求：

a) 每個年份是一行。

a) 年份需要從過去到現在的順序進行排列。比如說文章中出現了2007和1997，則1997需要排在2007的前面。

要求：

1. 僅限python編程，而且僅僅可以使用python自帶的函數或庫。

2. 提交可執行的程序以及mt.html, mt_word.txt, mt_year.txt。

3. 限定在一個小時內完成。

# 1. 下載https://en.wikipedia.org/wiki/Machine_translation 頁面的內容並保存為mt.html需要編寫代碼來下載頁面。
session = requests.session()
response = session.get(url="https://en.wikipedia.org/wiki/Machine_translation")
with open('mt.html','wb') as f:
    f.write(response.content)


# 2、統計mt.html中<p>標簽內下所有單詞以及數目並存儲到mt_word.txt中

# 解析頁面，拿到所有的p標簽中的文本
soup = BeautifulSoup(response.text,features="lxml")
tag2 = soup.find_all(name='p')
list_p = []
for i in tag2:
    list_p.append(i.get_text())

# 將所有的文本合並成一個字符串
str_p = ' '.join(list_p)
word_set = set()
for word in str_p.split():
    word = word.strip(',.()""/; ')
    word_set.add(word)
# word_dict = {}
word_list = []
for word in word_set:
    if word == '':
        continue
    # word_dict[word] = str_p.count(word)
    dict2 = {word:str_p.count(word)}
    word_list.append(dict2)

# 將單詞按照數目反序排列，然后寫入文件
blist = sorted(word_list,key = lambda x:list(x.values())[0],reverse =True)
with open('mt_word.txt','w') as f:
    for item in blist:
        for k,v in item.items():
            line = k + '\t' + str(v) + '\n'
            f.write(line)
            
# 3、提取出mt.html中所有的年份信息（比如說頁面中的1629, 1951這些的四位數字就是年份）存儲到mt_year.txt中
year = re.compile(r'\d{4}')
years_list = re.findall(year,response.text)
years_list = sorted(list(set(years_list)))
with open('mt_year.txt','w') as f:
    for year in years_list:
        line = year + '\n'
        f.write(line)

答案

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 測試面試題(3)面試題面試題——SSM面試題面試題 hvv面試題 MongoDB面試題 MySQL 面試題1 SpringBoot面試題 Docker面試題微軟面試題關於Js的那些面試題