python統計英文單詞出現次數【小例子】

本文轉載自查看原文 2018-03-14 15:00 4300 python學習

#你有一個目錄，放了你一個月的日記，都是 txt，為了避免分詞的問題，假設內容都是英文，請統計出你認為每篇日記最重要的詞
1.txt：i love you beijing
2.txt：i love you beijing hello world
3.txt：today is a good day
源碼：

import os,re

def find_word(file_path):
    file_list=os.listdir(file_path)#文件列表
    word_dict={}
    word_re=re.compile(r'[\w]+')#字符串前面加上r表示原生字符串，\w 匹配任何字類字符，包括下划線。與“[A-Za-z0-9_]”等效
    for file_name in file_list:
        if os.path.isfile(file_name) and os.path.splitext(file_name)[1]=='.txt':#os.path.splitext('c:\\csv\\test.csv') 結果('c:\\csv\\test', '.csv')
            try:
                f=open(file_name,'r')
                data=f.read()
                f.close()
                words=word_re.findall(data)#findall()返回的是括號所匹配到的結果（如regex1），多個括號就會返回多個括號分別匹配到的結果（如regex），如果沒有括號就返回就返回整條語句所匹配到的結果(如regex2)
                for word in words:
                    if word not in word_dict:
                        word_dict[word]=1 #從1為索引保存單詞
                    else:
                        word_dict[word] +=1
            except:
                print('open %s Error' % file_name)
    result_list=sorted(word_dict.items(),key=lambda t :t[1],reverse=True) #t[0]按key排序，t[1]按value排序？ 取前面系列中的第二個參數做排序
    for key,value in result_list:
        print('word',key,'appears %d times' % value)
if __name__=='__main__':
    find_word('.')

結果：

('word', 'beijing', 'appears 2 times')
('word', 'love', 'appears 2 times')
('word', 'i', 'appears 2 times')
('word', 'you', 'appears 2 times')
('word', 'a', 'appears 1 times')
('word', 'good', 'appears 1 times')
('word', 'is', 'appears 1 times')
('word', 'day', 'appears 1 times')
('word', 'world', 'appears 1 times')
('word', 'hello', 'appears 1 times')
('word', 'today', 'appears 1 times')

sort 與 sorted 區別：

sort 是應用在 list 上的方法，sorted 可以對所有可迭代的對象進行排序操作。

list 的 sort 方法返回的是對已經存在的列表進行操作，而內建函數 sorted 方法返回的是一個新的 list，而不是在原來的基礎上進行的操作。

sorted 語法：

sorted(iterable, cmp=None, key=None, reverse=False)

參數說明：

iterable -- 可迭代對象。
cmp -- 比較的函數，這個具有兩個參數，參數的值都是從可迭代對象中取出，此函數必須遵守的規則為，大於則返回1，小於則返回-1，等於則返回0。
key -- 主要是用來進行比較的元素，只有一個參數，具體的函數的參數就是取自於可迭代對象中，指定可迭代對象中的一個元素來進行排序。
reverse -- 排序規則，reverse = True 降序， reverse = False 升序（默認）。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python統計英文單詞出現次數並把結果生成字典 Python常用英文單詞統計英文文本中每個單詞的出現次數統計英文文檔里每個單詞出現的次數統計一個字符串中各個英文單詞的出現的頻數 JS從一段英文單詞中，找出出現次數最多的前10個單詞用javaIO流讀取文本中英文字母和英文單詞的出現次數及頻率算法題：讀入一篇英文文章，統計其中的單詞，並得到每個單詞出現的次數統計一篇英文文章內每個單詞出現頻率，並返回出現頻率最高的前10個單詞及其出現次數 python編程：統計文件中單詞出現次數