Perl-統計文本中各個單詞出現的次數(NVDIA2019筆試)


1、原題

 

 

 2、perl腳本

print "================ Method 1=====================\n";
open IN,'<','anna-karenina.txt';
while(<IN>){
        chomp;  
        $line = $_;
        $line =~ s/[ \. , ? ! ; : ' " ( ) { }  \[ \]]/ /g; #句號,逗號等統一改為空格
        #print("$line\n");
        @words = split(/\s+/,$line);
        foreach $word (@words){
                $counts{lc($word)}++;  #將出現的單詞存入hash表
        }
};


foreach $word (sort keys %counts) {
        print "$word,$counts{$word}\n";  #打印出單詞出現的個數
}
close IN;


print "================ Method 2=====================\n";
open IN,'<','anna-karenina.txt';
while (my $line = <IN>)
{
        #map{$words{$_}++;} $line =~ /(\w+)/g   # 與下面的語句等效

        #print($line =~ /(\w+)/g);
        foreach ($line =~ /(\w+)/g){   # 對單詞進行匹配
                #print("$_\n");
                $words{lc($_)}++;
        }
}
for (sort keys(%words))
{
    print "$_: $words{$_}\n";
}

 

3、結果

1)測試文本

All happy families resemble one another; every unhappy family is unhappy in its own way.
All was confusion in the house of Oblonskys. happy? happy: [happy] {happy} "happy" 'happy'

2)輸出

================ Method 1=====================
all,2
another,1
confusion,1
every,1
families,1
family,1
happy,7
house,1
in,2
is,1
its,1
oblonskys,1
of,1
one,1
own,1
resemble,1
the,1
unhappy,2
was,1
way,1
================ Method 2=====================
all: 2
another: 1
confusion: 1
every: 1
families: 1
family: 1
happy: 7
house: 1
in: 2
is: 1
its: 1
oblonskys: 1
of: 1
one: 1
own: 1
resemble: 1
the: 1
unhappy: 2
was: 1
way: 1

4、涉及的知識點

1)對多個項目進行替換可以使用方括號:

  $line =~ s/[ \. , ? ! ; : ' " ( ) { }  \[ \]]/ /g; #句號,逗號等統一改為空格

2)將單詞小寫lc,用哈希計數

  $counts{lc($word)}++;  #將出現的單詞存入hash表

3)訪問哈希整體%,訪問哈希鍵值keys %,排序sort

  sort keys %counts

4)方法2使用  $line =~ /(\w+)/g  直接將文本中的單詞轉換成列表

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM