Java實現的詞頻統計


要求:

1.讀取文件;

2.記錄出現的詞匯及出現頻率;

3.按照頻率降序排列;

4.輸出結果。

 

概要:

1.讀取的文件路徑是默認的,為了方便調試,將要統計的文章、段落復制到文本中即可;
2.只支持英文;
3.會按照詞匯出現的頻率降序排列。

 

實現:

1.使用FileReader、BufferedReader讀取文件;

2.采用StringTokenizer進行字符分割;

3.用hashmap保存統計數據;

4.自定義一個類用來實現按value排序;

5.輸出結果。

 

默認路徑文件:

1         String filename = "E:/Test.txt";
2 
3         FileReader fk = new FileReader(filename);
4         BufferedReader br = new BufferedReader(fk);

 

統計詞頻:

 1         String s;
 2         while((s = br.readLine()) != null) {
 3             file += s; //讀出整篇文章,存入String類的file中。
 4         }
 5 
 6         StringTokenizer st = new StringTokenizer(file," ,.!?\"'"); //用於切分字符串
 7 
 8         while(st.hasMoreTokens()) {
 9             String word = st.nextToken();
10             if(hm.get(word) != null) {
11                 int value = ((Integer)hm.get(word)).intValue();
12                 value++;
13                 hm.put(word, new Integer(value));
14             } 
15             else {
16                 hm.put(word, new Integer(1));
17             }
18         }

 

排序類:

 1 import java.util.Comparator;
 2 import java.util.TreeMap;
 3 
 4 public class ByValueComparator implements Comparator<String> {
 5     TreeMap<String, Integer> treemap;
 6     public ByValueComparator(TreeMap<String, Integer> tm) {
 7         this.treemap = tm;
 8     }
 9 
10     @Override
11     public int compare(String o1, String o2) {
12         // TODO Auto-generated method stub
13         if(!treemap.containsKey(o1) || !treemap.containsKey(o2)) {
14             return 0;
15         }
16         if(treemap.get(o1) < treemap.get(o2)) {
17             return 1;
18         } else if(treemap.get(o1) == treemap.get(o2)) {
19             return 0;
20         } else {
21             return -1;
22         }
23     }
24 }

 

輸出結果:

        TreeMap tm = new TreeMap(hm);

        ByValueComparator bvc = new ByValueComparator(tm);
        List<String> ll = new ArrayList<String>(tm.keySet());
        Collections.sort(ll, bvc);
        for(String str:ll){
            System.out.println(str+"——"+tm.get(str));
        }

 

實例驗證:

There are moments in life when you miss someone so much that you just want to pick them from your dreams and hug them for real! Dream what you want to dream;go where you want to go;be what you want to be,because you have only one life and one chance to do all the things you want to do.
May you have enough happiness to make you sweet,enough trials to make you strong,enough sorrow to keep you human,enough hope to make you happy? Always put yourself in others’shoes.If you feel that it hurts you,it probably hurts the other person, too.
The happiest of people don’t necessarily have the best of everything;they just make the most of everything that comes along their way.Happiness lies for those who cry,those who hurt, those who have searched,and those who have tried,for only they can appreciate the importance of people
who have touched their lives.Love begins with a smile,grows with a kiss and ends with a tear.The brightest future will always be based on a forgotten past, you can’t go on well in lifeuntil you let go of your past failures and heartaches.
When you were born,you were crying and everyone around you was smiling.Live your life so that when you die,you're the one who is smiling and everyone around you is crying.
Please send this message to those people who mean something to you,to those who have touched your life in one way or another,to those who make you smile when you really need it,to those that make you see the brighter side of things when you are really down,to those who you want to let them know that you appreciate their friendship.And if you don’t, don’t worry,nothing bad will happen to you,you will just miss out on the opportunity to brighten someone’s day with this message.

  結果:

you——32
to——19
who——9
those——9
the——8
have——7
and——7
of——6
make——6
that——6
want——6
your——4
with——4
when——4
one——4
life——4
a——4
in——4
enough——4
for——3
don’t——3
just——3
it——3
on——3
them——3
their——3
will——3
what——2
were——2
way——2
touched——2
this——2
things——2
so——2
smiling——2
smile——2
really——2
people——2
past——2
only——2
miss——2
message——2
let——2
is——2
hurts——2
go——2
everyone——2
do——2
crying——2
be——2
around——2
are——2
appreciate——2
The——2
another——1
always——1
along——1
all——1
When——1
There——1
Please——1
May——1
Love——1
Live——1
If——1
Happiness——1
Dream——1
And——1
Always——1
die——1
day——1
cry——1
comes——1
chance——1
can’t——1
can——1
brightest——1
brighter——1
brighten——1
born——1
best——1
begins——1
because——1
based——1
bad——1
happen——1
grows——1
go;be——1
future——1
from——1
friendship——1
forgotten——1
feel——1
failures——1
everything;they——1
everything——1
ends——1
dreams——1
dream;go——1
down——1
know——1
kiss——1
keep——1
importance——1
if——1
hurt——1
human——1
hug——1
hope——1
heartaches——1
happy——1
happiness——1
happiest——1
or——1
opportunity——1
nothing——1
need——1
necessarily——1
much——1
most——1
moments——1
mean——1
lives——1
lifeuntil——1
lies——1
side——1
send——1
see——1
searched——1
real——1
re——1
put——1
probably——1
pick——1
person——1
peoplewho——1
out——1
others’shoes——1
other——1
tried——1
trials——1
too——1
they——1
tear——1
sweet——1
strong——1
sorrow——1
something——1
someone’s——1
someone——1
yourself——1
worry——1
where——1
well——1
was——1

  

代碼地址:https://coding.net/u/regretless/p/WordFrequencyCount/git


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM