python文本處理 數據挖掘 停用詞檢索


簡單描述程序功能:

1.停用詞為csv文件

2.源文件為txt文件

3.文本處理,將原文件中出現的停用詞去除

代碼實現:

1.文件讀取,分詞,源文件詞頻統計

python 讀取 西班牙語文本編碼: encoding='ISO-8859-1'

 1 #csv 文件讀取,此處編碼為西班牙語
 2 def csvfile():
 3     file_path = os.path.join(upload_path, "SpanishStopWords.csv")
 4     with open(file_path,'r',encoding='ISO-8859-1') as f:
 5         reader = csv.reader(f)
 6         fieldnames = next(reader)#獲取數據的第一列,作為后續要轉為字典的鍵名 生成器,next方法獲取
 7         # print(fieldnames)
 8         data1=[]
 9         csv_reader = csv.DictReader(f,fieldnames=fieldnames) #self._fieldnames = fieldnames   # list of keys for the dict 以list的形式存放鍵名
10         for row in csv_reader:
11             dic1={}
12             for k,v in row.items():
13                 dic1[k]=v
14             data1.append(dic1)
15         return data1
16 #txt文件讀取
17 def eachcount():
18     file_path = os.path.join(upload_path, "Alamo.txt")
19     txt = open(file_path, 'r', encoding='ISO-8859-1').read()
20     #分詞
21     txt = txt.replace(',', ' ').replace('.', ' ')
22     txt = txt.split()
23     counts = {}  # 定義一個空字典類型
24     print(txt)
25     for word in txt:
26         counts[word] = counts.get(word, 0) + 1  # 獲取word當前有幾個,如果word不存在則為0
27     items = list(counts.items())
28     # 對一個列表按照鍵值對的兩個元素的第二個元素進行排序,由大到小的倒排,詞頻排序
29     items.sort(key=lambda x: x[1], reverse=False)
30     return items

2.顯示在原文件中出現的所有停用詞

 
         
#顯示在源文件中出現過的所有停用詞
@application.route('/listsearch/', methods=['GET', 'POST'])
def listsearch():
file_path = os.path.join(upload_path, "SpanishStopWords.csv")
txt = open(file_path, 'r', encoding='ISO-8859-1').read()
txt = txt.split()
filelist=txt
# filelist=csvfile()
filelist2=docu2()
# wordlist=["my","name","boy","chirs","Dave"]
result=[]
result2=[]
# for j in wordlist:
# for i in filelist:
# if i[0]== j :
# result.append(i)
for j in filelist:
for i in filelist2:
if j== i :
result2.append(j)
return render_template('index.html',result2=result2)

前端代碼展現:
<form action="/listsearch" method="get"  enctype="multipart/form-data">
<button type="submit" value="submit">search</button>

<p>result</p>
{% for line2 in result2 %}
<p>{{ line2}}</p>

{% endfor %}
</form>


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM